How do CNNs work when the output is multiple categories? For instance, in the same image is a cat and a dog and a car. What's the architecture look like - multiple CNNs, each that can predict one category? Or does one CNN have multiple outputs and if the score > threshold, add that category to the list shown to the user?<p>Also, how do CNNs draw a box around the target in the image?
> Parameters like number of filters, filter sizes, architecture of the network etc. have all been fixed before Step 1 and do not change during training process – only the values of the filter matrix and connection weights get updated.<p>Is this just the article's over-simplification or are these values really just randomly selected?
I actually don't think this is a good explanation at all. I'm not saying it's badly written, just that it's not a good explanation for the stated purpose (serving as an intuitive explanation).<p>To this point, the article is certainly NOT intuitive if you don't already understand image convolution. The explanation is also very long and rambling. While I understand the author has made an effort, I don't think the article really presents the subject matter in a new way: I can learn all of this elsewhere. This is a common problem when people write about complex subject matter without fully understanding the knowledge gap between teacher and audience.<p>If I were the author, I might try to read up on technical communication and spend some time figuring out how to correctly simply something. As it stands, this article using the typical strategy of information hiding to simplify the subject matter. The problem is that information hiding doesn't doesn't work very well unless it is expertly done. I do like the animation, but again, it only serves to show how image convolution works, and doesn't actually teach us anything about a CNN.<p>I would suggest the author break the document into three separate sections, the first being very simple (maybe start with the part that says 'images are just matrices') and then add more details in each section. The final section would have a lot of detail. That way you counteract the information blindness that occurs from simplification by providing the information later.<p>Otherwise, this article is really more of a data dump than an intuitive explanation, and since it doesn't really teach us anything we can't learn elsewhere, I don't see what it contributes.<p>A cleaner explanation, expertly prepared, could really elevate the effort that went into this.
The article is all right but for newbies reading it; be a little careful. The author is sloppy with terminology in a way that can trip up someone who is just learning. An example being that a Kernel and a Filter are not the same thing.
anyone happen to be familiar with any uses of CNNs on 1D "images"? (like you'd get from linear image sensors <a href="https://toshiba.semicon-storage.com/ap-en/product/sensor/linear-sensor.html" rel="nofollow">https://toshiba.semicon-storage.com/ap-en/product/sensor/lin...</a> )<p>i hit up google scholar occasionally looking for references, but literally everything seems to be applying them to 2D images.