(a) The document linked by the OP is to a blog post that discusses and extensively quotes another blog post [1] which in turn discusses an actual paper [2]. Naturally the paper is where the good stuff is.<p>(b) Both blog posts somewhat understate the problem. The adversarial examples given in the original paper aren't just classified differently than their parent image -- they're created to receive a specific classification. In the figure 5 of the arxiv version, for example, they show clear images of a school bus, temple, praying mantis, dog, etc, which all received the label "ostrich, Struthio camelus".<p>(c) The blog post at [1] wonders whether humans have similar adversarial inputs. Of course it's possible that we might, but I suspect that we have an easier time than these networks in part because:
(i) We often get labeled data on a stream of 'perturbed' related inputs by observing objects in time. If I see a white dog in real life, I don't get just a single image of it. I get a series of overlapping 'images' over a period of time, during which time it may move, I may move, the lighting may change, etc. So in a sense, human experience already includes the some of the perturbations that ML techniques have to introduce manually to become more robust.
(ii) We also get to take actions to get more/better perceptual data. If you see something interesting or confusing or just novel, you choose to focus on it, or get a better view because of that interestingness or novelty. The original paper talks about the adversarial examples as being in pockets of low probability. If humans encounter these pockets only rarely, it's because when we see something weird, we want to examine it, after which that particular pocket has higher probability.<p>[1] <a href="http://www.i-programmer.info/news/105-artificial-intelligence/7352-the-flaw-lurking-in-every-deep-neural-net.html" rel="nofollow">http://www.i-programmer.info/news/105-artificial-intelligenc...</a><p>[2] <a href="http://arxiv.org/abs/1312.6199" rel="nofollow">http://arxiv.org/abs/1312.6199</a> or <a href="http://cs.nyu.edu/~zaremba/docs/understanding.pdf" rel="nofollow">http://cs.nyu.edu/~zaremba/docs/understanding.pdf</a>
There are some possible applications of this:<p>Better captchas that are <i>optimized</i> to be hard for machines, but easy for humans.<p>Getting around automated systems that discriminate content. Like detecting copyrighted songs.<p>Training on these images improves generalization. Essentially these images add more data, since you know what class they should be given. But they are optimal in a certain sense, testing the things that NNs are getting it wrong, or finding the places where it has bad discontinuities.
Nobody pointed this out yet. It would be very interesting to keep finding such perturbations that mess up learning and repeatedly add the new-found examples to the training set, retraining the model in the process. I wonder if after a finite number of iterations the resulting model would be near-optimal (impossible to perturb without losing its human recognizability) -- or, if this is impossible, if we could derive some proofs for why precisely this is impossible.
I don't find it so surprising that, out of the vast number of possible small perturbations, there are a few that cause the image to be misclassified. I suppose it is interesting that you can systematically find such perturbations. But is there anything here which suggests that a neural network which does well on a test set won't continue to do well so long as the images given to it are truly "natural"?
More like "The Flaw Lurking in Every Machine Learning Algorithm with a Gradient"(1) IMO. For example, in a linear or logistic classifier, the derivative of the raw output with respect to the input is the input itself while the derivative of the input is the weight. Knowing this one can use the weights of <i>any</i> classifier to minimally adjust the input to produce a pathological example.<p>As for humans, I submit we have all sorts of issues like this. It's just that we have a temporal stream of slightly different versions of the input and that keeps inputs like this from having any significant area under the curve. Have you never suddenly noticed something that was right in front of you all along?<p>(1) And probably those that don't too, but it's harder to find cases like that without a gradient (not that it can't be done, because I've found them myself for linear classifiers using genetic algorithms, simulated annealing, and something that looked just like Thompson Sampling but wasn't).
Maybe it's a function of the fact that I'm not an AI expert, but I never thought it was that specialization for features (whether semantically meaningful or not) was localized to individual neurons, rather than the entire net. Why would we think otherwise?
I'm not an expert in any sense, just a curious bystander. Assuming that the ratio of perturbations causing misclassifications to ones that don't is extremely low, couldn't you perturb the image in the time dimension, such that "dog" misclassifications would be odd blips in an otherwise continuous "cat" signal, with some sort of smoothing applied that would average those blips away? And in fact wouldn't that be the default case in some real world implementations, such as biological NNs or driverless car ones? The input to the NN would be a live video feed captured via light focused on some kind of CCD, which is going to be inherently noisy.
Of course having data that contradicts the patterns that the neural network is looking for is going to make it err even when is subtle. Humans have it easier because we handle more abstract concepts. One way to solve this is "simplifying" the data: In practical terms (for images) that means applying a bilateral filter[0], also know in Photoshop as "surface blur".<p>[0] <a href="http://en.m.wikipedia.org/wiki/Bilateral_filter" rel="nofollow">http://en.m.wikipedia.org/wiki/Bilateral_filter</a>
The way people classify hybrid images is a function of how far away they are. I wonder if these are essentially hybrid images for neural nets. It seems like the noise being added is very high frequency. Given that, I would bet that neural nets classify typical hybrid images the same way as they would the sharper image component.
Humans have a constantly changing view of things, so any adversarial examples from one angle quickly change when viewed from another angle.<p>This issue can be framed in another way, something incorrectly classified could become easily classified with very minor changes in perspective.
I've always thought that our brains are more complex than we can model at the moment. There is some fundamental concept that we are missing and it allows the brain to classify things we can't do with even our best neural nets today.
A side question i've always wondered : if you train a nn to recognize a particular person's face from photos, will it be able to recognize a drawing / cartoon of that person ?
I am immediately forced to think of Godels Incompleteness Theorems. Can it be proved that these examples always exist within some bounds of manipulation?
Url changed from <a href="http://thinkingmachineblog.net/the-flaw-lurking-in-every-deep-neural-net-by-mike-james/" rel="nofollow">http://thinkingmachineblog.net/the-flaw-lurking-in-every-dee...</a>, which points to (actually, does a lot more than just point to) this.
This is a really interesting find. At the same time, I have this lurking fear that it will be misappropriated by the anti-intellectual idiots to marginalize the AI community, cut funding, et cetera. Another AI winter, just like the one after the (not at all shocking) "discovery" that single perceptrons can't model XOR.<p>If anything, this shows that we need more funding of machine learning and to create more jobs in it, so we can get a deeper understanding of what's really going on.