Resnet-50 with DawnBench settings is a very poor choice for illustrating this trend. The main technique driving this reduction in cost-to-train has been finding arcane, fast training schedules. This sounds good until you realize its a type of sleight of hand where finding that schedule takes tens of thousands of dollars (usually more) that isn't counted in cost-to-train, but is a real-world cost you would experience if you want to train models.<p>However, I think the overall trend this article talks about is accurate. There has been an increased focus on cost-to-train and you can see that with models like EfficientNet where NAS is used to optimize both accuracy and model size jointly.
This is an odd framing.<p>Training has become much more accessible, due to a variety of things (ASICs, offerings from public clouds, innovations on the data science side). Comparing it to Moore's Law doesn't make any sense to me, though.<p>Moore's Law is an observation on the pace of increase of a tightly scoped thing, the number of transistors.<p>The cost of training a model is not a single "thing," it's a cumulative effect of many things, including things as fluid as cloud pricing.<p>Completely possible that I'm missing something obvious, though.
What are some domains that a solo developer could build something commercially compelling to capture some of this $37 trillion? Are there any workflows or tools or efficiencies that could be easily realized as a commercial offering that would not require massive man hours to implement?
Ark Invest are the creators of the ARKK [1] and ARKW ETFs that have become retail darlings, mainly because they're heavily invested in TSLA.<p>They pride themselves on this type of fundamental, bottom up analysis on the market.<p>It's fine.. I don't know if I agree with using Moore's law which is fundamentally about hardware, with the cost to run a "system" which is a combination of customized hardware and new software techniques<p>[1] <a href="https://pages.etflogic.io/?ticker=ARKK" rel="nofollow">https://pages.etflogic.io/?ticker=ARKK</a>
I remember this article from 2018: <a href="https://medium.com/the-mission/why-building-your-own-deep-learning-computer-is-10x-cheaper-than-aws-b1c91b55ce8c" rel="nofollow">https://medium.com/the-mission/why-building-your-own-deep-le...</a><p>Hackernews discussion for the article: <a href="https://news.ycombinator.com/item?id=18063893" rel="nofollow">https://news.ycombinator.com/item?id=18063893</a><p>It really is interesting how this is changing the dynamics of neural network training. Now it is affordable to train a useful network on the cloud, whereas 2 years ago that would be reserved to companies with either bigger investments or an already consolidated product.
I would really like a thorough analysis on how expensive it is to multiply large matrices, which is the most expensive part of a transformer training for example according to the profiler. Is there some Moore’s law or similar trend?
It is regrettable if an equivalent to the self-fulfilling prophecy of Moore's "Law" (originally an astute observation and forecast, but not remotely a law) became a driver/limiter in this field as well, even more so if it's a straight transplant for soundbite reasons rather than through any impartial and thoughtful analysis.
Despite nvidia vaguely prohibiting users from using their desktop cards for machine learning in any sort of data center-like or server-like capacity. Hopefully AMDs ml support / OpenCl will continue improving
Does it mean that the cost to train something like gpt3 by OpenAI will reduce from 12 million dollars to less next year ? If so how much will it reduce to ?