Great presentation, but I do wish they'd throw in an equation or two. When they talk about the "channel objective", which they describe as "layer_n[:,:,z]", do they mean they are finding parameters that maximize the sum of the activations of RGB values of each channel? I'm not quite sure what the scalar loss function actually is here. I'm assuming some mean. (They discuss a few reduction operators, L_inf, L_2, in the preconditioning part but I don't think it's the same thing?)<p>The visualizations of image gradients was really fascinating, I never really thought about plotting the gradient of each pixel channel as an image. I take it these gradients are for a particular (and same) random starting value and step size? It's not totally clear.<p>(I have to say, "second-to-last figure.." again.. cool presentation but being able to say "figure 9" or whatever would be nice. Not <i>everything</i> about traditional publication needs to be thrown out the window.. figure and section numbers are useful for discussion!)