科技回声

7 条评论

Buttons840超过 3 年前

> For example, gradient descent is often used in machine learning in ways that don’t require extreme precision. But a machine learning researcher might want to double the precision of an experiment. In that case, the new result implies that they might have to quadruple the running time of their gradient descent algorithm. That’s not ideal, but it is not a deal breaker.> But for other applications, like in numerical analysis, researchers might need to square their precision. To achieve such an improvement, they might have to square the running time of gradient descent, making the calculation completely intractable.I think a silver lining here is that a company having access to 10,000 times as much computing power as a common consumer can only achieve, say, a 10x better model. So the inequality isn't as extreme as it could be.

评论 #28219246 未加载

评论 #28217938 未加载

评论 #28217940 未加载

评论 #28225462 未加载

评论 #28218200 未加载

评论 #28219122 未加载

评论 #28217347 未加载

miketery超过 3 年前

> Brouwer’s fixed point theorem. The theorem says that for any continuous function, there is guaranteed to be one point that the function leaves unchanged — a fixed point, as it’s known. This is true in daily life. If you stir a glass of water, the theorem guarantees that there absolutely must be one particle of water that will end up in the same place it started from.This seems counter intuitive. Why would this be true? I would have thought that it’s almost impossible for a particle to end up in the same place as it started.

评论 #28216372 未加载

评论 #28216420 未加载

评论 #28216812 未加载

评论 #28216348 未加载

评论 #28216327 未加载

评论 #28217722 未加载

评论 #28218468 未加载

评论 #28220276 未加载

评论 #28216384 未加载

评论 #28225058 未加载

评论 #28216352 未加载

评论 #28219153 未加载

评论 #28216445 未加载

评论 #28219854 未加载

tzs超过 3 年前

> But for other applications, like in numerical analysis, researchers might need to square their precision. To achieve such an improvement, they might have to square the running time of gradient descent, making the calculation completely intractableThere is an easy way to counteract this. Let’s say your original runs in 1000 seconds. Squaring that would give 1000000 seconds. That’s a jump from under 17 minutes to over 11 days.The fix: measure everything in hours. Your original runs in 0.278 hours. Squaring that gives 0.077 hours. Squaring the precision now makes it run 3.6 times faster.

评论 #28221017 未加载

dynm超过 3 年前

> But a machine learning researcher might want to double the precision of an experiment. In that case, the new result implies that they might have to quadruple the running time of their gradient descent algorithm.> But for other applications, like in numerical analysis, researchers might need to square their precision. To achieve such an improvement, they might have to square the running time of gradient descentI don't quite understand this. Say f(p) is the complexity of gradient descent for precision p. The first paragraph seems to imply that f(p) = c×p^2, so that f(2p) = c×4×p^2 = f(p). However, the second paragraph then just seems to imply that f(p) = c×p. Am I missing something?Incidentally, I love that Quanta brings attention to work mostly just on the basis of how important it is. 99% of the other stuff you hear about new research is really just driven by PR departments at the researcher's home institutions.

评论 #28225506 未加载

FabHK超过 3 年前

Now, that article was written by an academic, not a practitioner (my emphasis):> You can imagine a function as a landscape, where the elevation of the land is equal to the value of the function (the “profit”) at that particular spot. Gradient descent searches for the function’s local minimum [...]

optimalsolver超过 3 年前

Derivative-free optimization to the rescue!

评论 #28217355 未加载

gerdesj超过 3 年前

You say minima and I say maxima; They say maybe over there ... no left a bit.

7 条评论

Buttons840超过 3 年前

评论 #28219246 未加载

评论 #28217938 未加载

评论 #28217940 未加载

评论 #28225462 未加载

评论 #28218200 未加载

评论 #28219122 未加载

评论 #28217347 未加载

miketery超过 3 年前

评论 #28216372 未加载

评论 #28216420 未加载

评论 #28216812 未加载

评论 #28216348 未加载

评论 #28216327 未加载

评论 #28217722 未加载

评论 #28218468 未加载

评论 #28220276 未加载

评论 #28216384 未加载

评论 #28225058 未加载

评论 #28216352 未加载

评论 #28219153 未加载

评论 #28216445 未加载

评论 #28219854 未加载

tzs超过 3 年前

评论 #28221017 未加载

dynm超过 3 年前

评论 #28225506 未加载

FabHK超过 3 年前

optimalsolver超过 3 年前

Derivative-free optimization to the rescue!

评论 #28217355 未加载

gerdesj超过 3 年前

You say minima and I say maxima; They say maybe over there ... no left a bit.

Complexity theory puts limits on performance of gradient descent

7 条评论

Complexity theory puts limits on performance of gradient descent

7 条评论