Looking at the finger instead of the moon: I like the HTML layout (responsive, inline images with captions, lateral notes).<p>Any insights on how it's generated? Markdown, Rst, Latex -> HTML? I would love to produce my documentation in this way.<p>Edit: I was too hurried. Everything is explained in <a href="https://distill.pub/guide/" rel="nofollow">https://distill.pub/guide/</a>, the template is at <a href="https://github.com/distillpub/template" rel="nofollow">https://github.com/distillpub/template</a>
Great presentation, but I do wish they'd throw in an equation or two. When they talk about the "channel objective", which they describe as "layer_n[:,:,z]", do they mean they are finding parameters that maximize the sum of the activations of RGB values of each channel? I'm not quite sure what the scalar loss function actually is here. I'm assuming some mean. (They discuss a few reduction operators, L_inf, L_2, in the preconditioning part but I don't think it's the same thing?)<p>The visualizations of image gradients was really fascinating, I never really thought about plotting the gradient of each pixel channel as an image. I take it these gradients are for a particular (and same) random starting value and step size? It's not totally clear.<p>(I have to say, "second-to-last figure.." again.. cool presentation but being able to say "figure 9" or whatever would be nice. Not <i>everything</i> about traditional publication needs to be thrown out the window.. figure and section numbers are useful for discussion!)
There’s also an appendix where you can browse all the layers. <a href="https://distill.pub/2017/feature-visualization/appendix/googlenet/4b.html" rel="nofollow">https://distill.pub/2017/feature-visualization/appendix/goog...</a>
Are the layer names the same ones referred to in this paper?
<a href="https://arxiv.org/abs/1409.4842" rel="nofollow">https://arxiv.org/abs/1409.4842</a><p>And how can e.g. layer3a be generated from layer conv2d0? By convolving with a linear kernel? Or by the entire Inception Module including the linear and the non-linear operations?<p>Thank you. Outstanding work breaking it down.<p>Here's another paper people might enjoy. The author generates an example for "Saxophone," which includes a player... Which is fascinating, bc it implies that our usage of the word in real practice implies a player, even though the Saxophone is an instrument only. This highlights the difference between our denotative language and our experience of language! <a href="https://www.auduno.com/2015/07/29/visualizing-googlenet-classes/" rel="nofollow">https://www.auduno.com/2015/07/29/visualizing-googlenet-clas...</a><p>Also, for those curious about the DepthConcat operation, it's described here: <a href="https://stats.stackexchange.com/questions/184823/how-does-the-depthconcat-operation-in-going-deeper-with-convolutions-work" rel="nofollow">https://stats.stackexchange.com/questions/184823/how-does-th...</a><p>Edit: I'll be damned if there isn't something downright <i>Jungian</i> about these prototypes! There are snakes! Man-made objects! Shelter structures! Wheels! Animals! Sexy legs! The connection between snakes and guitar bodies is blowing my mind!
This didn't include my favorite kind of visualization from Nguyen, et al., 2015: <a href="https://i.imgur.com/AERgy7I.png" rel="nofollow">https://i.imgur.com/AERgy7I.png</a>
Wow. That's incredible how psychedelic these images are. I'd be really curious to learn more about the link between these two seemingly distant subjects.
This pictures reminds me about what one's can see under psychedelics. All sensory input basically begins to break down to that kind of patterns, and thus reality dissolves into nothing. This is equally terrifying and liberating depends on look. The terrifying thought is that there's no-one behind this eyes and ears. The liberating thought is that if there's no-one there, then there's no-one to die.
Hi Chris, firstly thanks for all the work you've done publishing brilliant articles on supervised and unsupervised methods and visualisation on your old blog and now in Distill.<p>This question isn't about feature visualisation, but I though I'd take the chance to ask you, what do you think of Hinton's latest paper and his move away from neural network architectures?
Interesting that simple optimization ends up with high-frequency noise similar to adversarial attacks on neural nets.<p>While I agree that the practicality of these visualizations mean that you have to fight against this high-frequency "cheating", I can't help but shake the feeling that what these optimization visualizations are showing us is <i>correct</i>. <i>This</i> is what the neuron responds to, whether you like it or not. Put in another way, the problem doesn't seem to be with the visualization but with the <i>network itself</i>.<p>Has there been any research in making neural networks that are robust to adversarial examples?
Cool. Reminds me a bit of <a href="https://qualiacomputing.com/2016/12/12/the-hyperbolic-geometry-of-dmt-experiences/" rel="nofollow">https://qualiacomputing.com/2016/12/12/the-hyperbolic-geomet...</a><p>(Though maybe not as symmetric?)
Is there any way to run images from a camera real-time into GoogLeNet?<p>E.g. like if I want to scan areas around me to see if there are any perspectives in my environment that light up the "snake" neurons or the dog neurons???
Okay...maybe a stupid question.<p>Could they train on white noise from a television and see if the CBR shows a structure similar to the structure of the observable universe when examining the feature layers?
Awesome, but to me this stuff is also terrifying, and I can't quite place why.<p>Something about dissecting intelligence, and the potential that our own minds process things similarly. Creepy how our reality is distilled into these uncanny valley type matrices.<p>Also, I suspect it says something that these images look like what people report seeing on psychedelic trips...