TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

An overview of gradient descent optimization algorithms

78 pointsby azuajefalmost 8 years ago

4 comments

thearn4almost 8 years ago
It isn&#x27;t mentioned in the abstract, but this seems to be more of an overview of ML-specific notions of gradient descent, where batch processing is possible due to needing to leverage gradients of a fixed prediction architecture over a large set of training data, with respect to tunable weights.<p>So each of those training points represents a sort of separable or parallelizable piece of the whole processes, giving you a ton of freedom in how you actually execute the gradient stepping (with one training point, several of them, or all of them). As I understand it, stochasticity in this process interestingly seems to add enough &quot;noise&quot; that local minima seem to be avoided in many cases.<p>In more general applications of non-linear gradient-based optimization (say for optimizing parametric models in physical engineering), this doesn&#x27;t necessarily come into play.
评论 #14731046 未加载
pacmansyyualmost 8 years ago
Here[1] is an article describing the same, written by the author himself.<p>[1]: <a href="http:&#x2F;&#x2F;ruder.io&#x2F;optimizing-gradient-descent&#x2F;index.html" rel="nofollow">http:&#x2F;&#x2F;ruder.io&#x2F;optimizing-gradient-descent&#x2F;index.html</a>
jablalmost 8 years ago
Stupid Q: Assuming &quot;gradient descent&quot; is roughly similar to the classical &quot;steepest descent&quot; optimization algorithm (???), why aren&#x27;t deep learning researchers looking into other more advanced algorithms from classical non-linear optimization theory. Like, say, (preconditioned) conjugate gradient, or quasi-Newton methods such as BFGS?
abakusalmost 8 years ago
SGD &gt; adaptive, according to this:<p><a href="https:&#x2F;&#x2F;people.eecs.berkeley.edu&#x2F;~brecht&#x2F;papers&#x2F;17.WilEtAl.Ada.pdf" rel="nofollow">https:&#x2F;&#x2F;people.eecs.berkeley.edu&#x2F;~brecht&#x2F;papers&#x2F;17.WilEtAl.A...</a>