Very interesting paper. With some surprising insights (need to read it a couple more times for sure).<p>The conclusion states:<p>> Overall, attaining models that are robust and interpretable will require explicitly
> encoding human priors into the training process.<p>I feel that is true, though another part of the solution IMO lies in coming up with classifiers that can do more than output a probability alone. I agree that classifiers being sensitive to well-crafted adversarial attacks is something that can't be avoided (and perhaps even shouldn't be avoided at the train-data level), but the problem lies mainly at the output end. As a user, the model gives no insights towards how "sure" it feels about its prediction or whether the inputs deviate from the train set (especially in the useful non-robust feature set). This is especially a problem given that we stick softmax on almost all neural networks, which has a tendency to over-estimate the probability of the rank 1 prediction which confuses humans. Most adversarial attacks show [car: 99%, ship: 0.01%, ...] for the original image and [ship: 99%, car: 0.01%, ...] for the perturbed image.<p>Using interpretability and explanatory tools to inspect models is a good start, though I'd like to see more attention being given to:<p>- Feedback with regards to whether a given instance deviates from the training set, and to which extent<p>- Bayesian constructs w.r.t. uncertainty being incorporated, instead of only probabilities. Work exists that tries to do this already [1,2] with very nice results, though is not really "mainstream"<p>[1]: <a href="https://alexgkendall.com/computer_vision/bayesian_deep_learning_for_safe_ai/" rel="nofollow">https://alexgkendall.com/computer_vision/bayesian_deep_learn...</a><p>[2]: <a href="https://eng.uber.com/neural-networks-uncertainty-estimation/" rel="nofollow">https://eng.uber.com/neural-networks-uncertainty-estimation/</a>