Abstract: Given a state-of-the-art deep neural network classifier, we show the existence of a universal (image-agnostic) and very small perturbation vector that causes natural images to be misclassified with high probability. We propose a systematic algorithm for computing universal perturbations, and show that state-of-the-art deep neural networks are highly vulnerable to such perturbations, albeit being quasi-imperceptible to the human eye. We further empirically analyze these universal perturbations and show, in particular, that they generalize very well across neural networks. The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers. It further outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images.
This seems to imply the features lernt by neural networks are very different from the features humans use to distinguish the same objects because they are affected by distortions that do almost not interfere with features used by humans at all.
Several of the universal perturbation vectors in Figure 4 remind me a lot of Deep Dream's textures.<p>I wonder what it is about these high-saturation, stripy-spiraly bits that these networks are responding to.<p>Is it something inherent in natural images? In the training algorithm? In our image compression algorithms? Presumably, the networks would work better if they weren't so hypersensitive to these patterns, so finding a way to dial that down seems like it could be pretty fruitful.
This is really great research and interesting: (very roughly) how to compute a very small mask which, when applied to any image, makes the neural network misclassify it, whereas humans would notice no essential difference.<p>Quite remarkable.
This is why I'm never driving a car that is classifying stuff with neural networks. Some dust, some shitty weather conditions and that pigeon becomes a green light.
In signal processing you often have to pass the data through some sort of low-pass filter before attempting your analysis. I would be surprised if that isn't one of the methods being tried to protect deep neural nets from some of these attacks. Obviously there are some issues (needing to train on similar data, and such blurring interfering with first-level features that emulate edge-detection and so on).
Reminds me a little bit of the short story BLIT [1], where scientists have accidentally created images that crash the human brain. Cool stuff!<p>[1]: <a href="https://en.wikipedia.org/wiki/BLIT_(short_story)" rel="nofollow">https://en.wikipedia.org/wiki/BLIT_(short_story)</a>
I'm guessing it won't be long until someone uses this technique to computer and apply perturbation masks to pornographic imagery and make NN-based porn detectors/filters (like the one Yahoo recently open-sourced) a lot less effective.
Is there reason to think the human visual system is sufficiently well modeled by deep neural nets that our brains might exhibit this same behavior? My first thought was the perturbation images would need to be distinct per person, but photosensitive epilepsy like the Pokémon event [0] might suggest the possibility of shared perturbation vectors.<p>[0] <a href="https://en.m.wikipedia.org/wiki/Photosensitive_epilepsy" rel="nofollow">https://en.m.wikipedia.org/wiki/Photosensitive_epilepsy</a>
My science-fiction brain is, of course, interested in this as a method to defeat face-detection <i>in a way humans can't see</i>. I'd like to think that the crew of the Firefly used this technology to avoid detection when they did jobs in the heart of Alliance territory.
Can someone help with a notation question? In section 4 of the paper, the norm of the perturbation is constrained to a maximum of 2'000 which presumably is "small" but I don't know how to parse an apostrophe like that
My intuition is that the existence of adversarial images with barely perceptible differences but a high-confidence misclassification will lead to a new NN architecture for image classification.