TechEcho

7 comments

infogulchabout 3 years ago

The most interesting thing I've seen on AD is "The simple essence of automatic differentiation" (2018) [1]. See past discussion [2], and talk [3]. I think the main idea is that by compiling to categories and pairing up a function with its derivative, the pair becomes trivially composable in forward mode, and the whole structure is easily converted to reverse mode afterwards.[1]: <a href="https://dl.acm.org/doi/10.1145/3236765" rel="nofollow">https://dl.acm.org/doi/10.1145/3236765</a>[2]: <a href="https://news.ycombinator.com/item?id=18306860" rel="nofollow">https://news.ycombinator.com/item?id=18306860</a>[3]: Talk at Microsoft Research: <a href="https://www.youtube.com/watch?v=ne99laPUxN4" rel="nofollow">https://www.youtube.com/watch?v=ne99laPUxN4</a> Other presentations listed here: <a href="https://github.com/conal/essence-of-ad" rel="nofollow">https://github.com/conal/essence-of-ad</a>

评论 #31018084 未加载

yauneyzabout 3 years ago

My professor has talked about this. He thinks that the real gem of the deep learning revolution is the ability to take the derivative of arbitrary code and use that to optimize. Deep learning is just one application of that, but there are tons more.

评论 #31016005 未加载

评论 #31016029 未加载

评论 #31017667 未加载

choegerabout 3 years ago

Nice article, but the intro is a little lengthy.I have one remark, though: If your language allows for automatic differentiation already, why do you bother with a neural network in the first place?I think you should have a good reason why you choose a neural network for your approximation of the inverse function and why it has exactly that amount of layers. For instance, why shouldn't a simple polynomial suffice? Could it be that your neural network ends up as an approximation of the Taylor expansion of your inverse function?

评论 #31025145 未加载

评论 #31018617 未加载

评论 #31022511 未加载

评论 #31023115 未加载

评论 #31023611 未加载

PartiallyTypedabout 3 years ago

The nice thing about differentiable programming is that we can use all sorts of different optimizers compared to gradient descent that can offer quadratic convergence instead of linear!

评论 #31015995 未加载

评论 #31017369 未加载

fennecsabout 3 years ago

Does someone have an example where the ability to “differentiate” a program gets you something interesting?I understand perfectly what it means for a neural network, but how about more abstract things.Im not even sure as currently presented, the implementation actually means something. What is the derivative of a function like List, or Sort or GroupBy etc? These articles all assume that somehow it just looks like derivative from calculus somehow.Approximating everything as some non smooth real function doesn’t seem entirely morally correct. A program is more discrete or synthetic. I think it should be a bit more algebraic flavoured, like differentials over a ring.

fghorowabout 3 years ago

At first glance, this approach appears to re-invent an applied mathematics approach to optimal control. There, one writes a generalized Hamiltonian, from which forward and backward-in-time paths can be iterated.The Pontryagin maximum (or minumum, if you define your objective function with a minus sign) principle is the essence to that approach to optimal control.

评论 #31023213 未加载

nooberminabout 3 years ago

The article is okay but it would have helped to have labelled the axes of the graphs.

7 comments

infogulchabout 3 years ago

评论 #31018084 未加载

yauneyzabout 3 years ago

评论 #31016005 未加载

评论 #31016029 未加载

评论 #31017667 未加载

choegerabout 3 years ago

评论 #31025145 未加载

评论 #31018617 未加载

评论 #31022511 未加载

评论 #31023115 未加载

评论 #31023611 未加载

PartiallyTypedabout 3 years ago

The nice thing about differentiable programming is that we can use all sorts of different optimizers compared to gradient descent that can offer quadratic convergence instead of linear!

评论 #31015995 未加载

评论 #31017369 未加载

fennecsabout 3 years ago

fghorowabout 3 years ago

评论 #31023213 未加载

nooberminabout 3 years ago

The article is okay but it would have helped to have labelled the axes of the graphs.

Differentiable Programming – A Simple Introduction

7 comments

Differentiable Programming – A Simple Introduction

7 comments