I have to be honest, I don't think this is a good explanation. I don't know what differential programming is, but I'm fairly sure I have the mathematical background to understand it. But I didn't come away from this article with any confidence that I'm following along.<p>On a superficial level it seems like it:<p>1. Generalizes deep learning to an optimization function on decomposable input, and<p>2. Reduces the number of parameters required to learn the input by exploiting the structure of the input, thereby making learning more efficient.<p>Is that correct? Is it completely off? What am I missing? Is there any more meat to the article than this?<p>Could someone who has upvoted this (and ideally understands the topic well) provide a different explanation of the concept? It would be great if I could see a real world example (even a relatively trivial one) represented in both the traditional matrix computation form and the sexy new differentiable form.