科技回声

7 条评论

infogulch大约 3 年前

The most interesting thing I've seen on AD is "The simple essence of automatic differentiation" (2018) [1]. See past discussion [2], and talk [3]. I think the main idea is that by compiling to categories and pairing up a function with its derivative, the pair becomes trivially composable in forward mode, and the whole structure is easily converted to reverse mode afterwards.[1]: <a href="https://dl.acm.org/doi/10.1145/3236765" rel="nofollow">https://dl.acm.org/doi/10.1145/3236765</a>[2]: <a href="https://news.ycombinator.com/item?id=18306860" rel="nofollow">https://news.ycombinator.com/item?id=18306860</a>[3]: Talk at Microsoft Research: <a href="https://www.youtube.com/watch?v=ne99laPUxN4" rel="nofollow">https://www.youtube.com/watch?v=ne99laPUxN4</a> Other presentations listed here: <a href="https://github.com/conal/essence-of-ad" rel="nofollow">https://github.com/conal/essence-of-ad</a>

评论 #31018084 未加载

yauneyz大约 3 年前

My professor has talked about this. He thinks that the real gem of the deep learning revolution is the ability to take the derivative of arbitrary code and use that to optimize. Deep learning is just one application of that, but there are tons more.

评论 #31016005 未加载

评论 #31016029 未加载

评论 #31017667 未加载

choeger大约 3 年前

Nice article, but the intro is a little lengthy.I have one remark, though: If your language allows for automatic differentiation already, why do you bother with a neural network in the first place?I think you should have a good reason why you choose a neural network for your approximation of the inverse function and why it has exactly that amount of layers. For instance, why shouldn't a simple polynomial suffice? Could it be that your neural network ends up as an approximation of the Taylor expansion of your inverse function?

评论 #31025145 未加载

评论 #31018617 未加载

评论 #31022511 未加载

评论 #31023115 未加载

评论 #31023611 未加载

PartiallyTyped大约 3 年前

The nice thing about differentiable programming is that we can use all sorts of different optimizers compared to gradient descent that can offer quadratic convergence instead of linear!

评论 #31015995 未加载

评论 #31017369 未加载

fennecs大约 3 年前

Does someone have an example where the ability to “differentiate” a program gets you something interesting?I understand perfectly what it means for a neural network, but how about more abstract things.Im not even sure as currently presented, the implementation actually means something. What is the derivative of a function like List, or Sort or GroupBy etc? These articles all assume that somehow it just looks like derivative from calculus somehow.Approximating everything as some non smooth real function doesn’t seem entirely morally correct. A program is more discrete or synthetic. I think it should be a bit more algebraic flavoured, like differentials over a ring.

fghorow大约 3 年前

At first glance, this approach appears to re-invent an applied mathematics approach to optimal control. There, one writes a generalized Hamiltonian, from which forward and backward-in-time paths can be iterated.The Pontryagin maximum (or minumum, if you define your objective function with a minus sign) principle is the essence to that approach to optimal control.

评论 #31023213 未加载

noobermin大约 3 年前

The article is okay but it would have helped to have labelled the axes of the graphs.

7 条评论

infogulch大约 3 年前

评论 #31018084 未加载

yauneyz大约 3 年前

评论 #31016005 未加载

评论 #31016029 未加载

评论 #31017667 未加载

choeger大约 3 年前

评论 #31025145 未加载

评论 #31018617 未加载

评论 #31022511 未加载

评论 #31023115 未加载

评论 #31023611 未加载

PartiallyTyped大约 3 年前

The nice thing about differentiable programming is that we can use all sorts of different optimizers compared to gradient descent that can offer quadratic convergence instead of linear!

评论 #31015995 未加载

评论 #31017369 未加载

fennecs大约 3 年前

fghorow大约 3 年前

评论 #31023213 未加载

noobermin大约 3 年前

The article is okay but it would have helped to have labelled the axes of the graphs.

Differentiable Programming – A Simple Introduction

7 条评论

Differentiable Programming – A Simple Introduction

7 条评论