TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Complexity theory puts limits on performance of gradient descent

150 点作者 programd超过 3 年前

7 条评论

Buttons840超过 3 年前
&gt; For example, gradient descent is often used in machine learning in ways that don’t require extreme precision. But a machine learning researcher might want to double the precision of an experiment. In that case, the new result implies that they might have to quadruple the running time of their gradient descent algorithm. That’s not ideal, but it is not a deal breaker.<p>&gt; But for other applications, like in numerical analysis, researchers might need to square their precision. To achieve such an improvement, they might have to square the running time of gradient descent, making the calculation completely intractable.<p>I think a silver lining here is that a company having access to 10,000 times as much computing power as a common consumer can only achieve, say, a 10x better model. So the inequality isn&#x27;t as extreme as it could be.
评论 #28219246 未加载
评论 #28217938 未加载
评论 #28217940 未加载
评论 #28225462 未加载
评论 #28218200 未加载
评论 #28219122 未加载
评论 #28217347 未加载
miketery超过 3 年前
&gt; Brouwer’s fixed point theorem. The theorem says that for any continuous function, there is guaranteed to be one point that the function leaves unchanged — a fixed point, as it’s known. This is true in daily life. If you stir a glass of water, the theorem guarantees that there absolutely must be one particle of water that will end up in the same place it started from.<p>This seems counter intuitive. Why would this be true? I would have thought that it’s almost impossible for a particle to end up in the same place as it started.
评论 #28216372 未加载
评论 #28216420 未加载
评论 #28216812 未加载
评论 #28216348 未加载
评论 #28216327 未加载
评论 #28217722 未加载
评论 #28218468 未加载
评论 #28220276 未加载
评论 #28216384 未加载
评论 #28225058 未加载
评论 #28216352 未加载
评论 #28219153 未加载
评论 #28216445 未加载
评论 #28219854 未加载
tzs超过 3 年前
&gt; But for other applications, like in numerical analysis, researchers might need to square their precision. To achieve such an improvement, they might have to square the running time of gradient descent, making the calculation completely intractable<p>There is an easy way to counteract this. Let’s say your original runs in 1000 seconds. Squaring that would give 1000000 seconds. That’s a jump from under 17 minutes to over 11 days.<p>The fix: measure everything in hours. Your original runs in 0.278 hours. Squaring that gives 0.077 hours. Squaring the precision now makes it run 3.6 times faster.
评论 #28221017 未加载
dynm超过 3 年前
&gt; But a machine learning researcher might want to double the precision of an experiment. In that case, the new result implies that they might have to quadruple the running time of their gradient descent algorithm.<p>&gt; But for other applications, like in numerical analysis, researchers might need to square their precision. To achieve such an improvement, they might have to square the running time of gradient descent<p>I don&#x27;t quite understand this. Say f(p) is the complexity of gradient descent for precision p. The first paragraph seems to imply that f(p) = c×p^2, so that f(2p) = c×4×p^2 = f(p). However, the second paragraph then just seems to imply that f(p) = c×p. Am I missing something?<p>Incidentally, I love that Quanta brings attention to work mostly just on the basis of how important it is. 99% of the other stuff you hear about new research is really just driven by PR departments at the researcher&#x27;s home institutions.
评论 #28225506 未加载
FabHK超过 3 年前
Now, that article was written by an academic, not a practitioner (my emphasis):<p>&gt; You can imagine a function as a landscape, where the elevation of the land is equal to the value of the function (the <i>“profit”</i>) at that particular spot. Gradient descent searches for the function’s local <i>minimum</i> [...]
optimalsolver超过 3 年前
Derivative-free optimization to the rescue!
评论 #28217355 未加载
gerdesj超过 3 年前
You say minima and I say maxima; They say maybe over there ... no left a bit.