科技回声

6 条评论

modeless大约 2 年前

Seems to me like the whole history of neural nets is basically crafting models with well-behaved gradients to make gradient descent work well. That, and models that can achieve high utilization of available hardware. The surprising thing is that models exist where the gradients are so well-behaved that we can learn GPT-4 level stuff.

评论 #35679733 未加载

评论 #35683112 未加载

0xBABAD00C大约 2 年前

> Gradients Are Not All You NeedSometimes you need to peek at the Hessian.Seriously though, what is intelligence if not creative unrolling of the first few terms of the Taylor expansion?

评论 #35681843 未加载

Der_Einzige大约 2 年前

Global optimization techniques which don't rely on gradients seems theoretically superior in all instances, except that we haven't found super fast ways to run these kinds of optimizers.The cartpoll demo famously tripped up derivative based reinforcement learning for awhile.

评论 #35680201 未加载

评论 #35683169 未加载

msackmann大约 2 年前

Interesting paper, thanks for bringing this up! I have been working on methods for trajectory optimization using both, analytic gradient computations and black box stochastic gradient approximations (proximal policy optimization).I was always wondering about a question that is touched in the paper: despite the analytic gradient computation being intuitively more efficient and mathematically correct, it is much harder to learn a policy with it than with the “brute force trial-and-error” black box methods.This paper brings many new perspectives on why.

asdfman123大约 2 年前

> chaos based failure modeI studied this in undergrad, but it’s not the same thing the paper is talking about

unlikelymordant大约 2 年前

My one wish is that machine learning papers would use paper titles that actually described what the paper was about. I suppose there is a certain 'evolutionary pressure' where clever titles 'outcompete' dryer, more descriptive titles (or it seems that way). But i don't like it.

评论 #35678536 未加载

评论 #35680365 未加载

评论 #35678445 未加载

评论 #35679920 未加载

评论 #35678497 未加载

评论 #35678388 未加载

评论 #35678628 未加载

评论 #35679400 未加载

评论 #35678375 未加载

评论 #35695915 未加载

评论 #35679397 未加载

评论 #35679578 未加载

6 条评论

modeless大约 2 年前

评论 #35679733 未加载

评论 #35683112 未加载

0xBABAD00C大约 2 年前

> Gradients Are Not All You NeedSometimes you need to peek at the Hessian.Seriously though, what is intelligence if not creative unrolling of the first few terms of the Taylor expansion?

评论 #35681843 未加载

Der_Einzige大约 2 年前

评论 #35680201 未加载

评论 #35683169 未加载

msackmann大约 2 年前

asdfman123大约 2 年前

> chaos based failure modeI studied this in undergrad, but it’s not the same thing the paper is talking about

Gradients are not all you need

6 条评论

Gradients are not all you need

6 条评论