My own view of this having spent some time in visual neuroscience is that if you really want vision that is robust to these kinds of issues then you have to build a geometric representation of the world first, and then learn/map categories from that. Trying to jump from a matrix to a label without having an intervening topological/geometric model of the world in between (having 2 eyes and/or the ability to move and help with this) is asking for trouble because we think we are recapitulating biology when in fact we are doing nothing of the sort (as these adversarial examples reveal beautifully).
There are plenty of adversarial examples for humans too: <a href="http://i.imgur.com/mOTHgnf.jpg" rel="nofollow">http://i.imgur.com/mOTHgnf.jpg</a>
IMO, what these adversarial examples give us is a way to boost training data. We should augment training datasets with adversarial examples, or use adversarial training methods. The resulting networks would only be more robust as a result.<p>As for self-driving cars, this is a good argument for having multiple sensing modalities in addition to visual, such as radar/lidar/sonar, and multiple cameras, infrared in addition to visible light.
I can paint a road to a tunnel on a mountain side and fool some amount of people. Meep. Meep.<p>The problem isn't that there are adversarial inputs. The problem is that the adversarial inputs aren't <i>also</i> adversarial (or detectable) to the human visual system.
It's not clear to me how malicious actors can manipulate this observation to confuse self-driving cars. That said, I don't think this discredits the point of the article; it's important to note how easily deep learning models can be fooled if you understand the math behind them. I just think the example of tricking self-driving cars is difficult to relate with / understand.