TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

HuggingFace Training Cluster as a Service

101 pointsby kashifrover 1 year ago

11 comments

TechTechTechover 1 year ago
At the moment of writing the cost estimate for 70B multimodal model with 7T tokens on 1000 H100 GPUs is $18,461,354 with 184 days of training time.<p>Anyone willing to share an estimate how cost will come down each year as hardware keeps improving and possible new methodologies are found?<p>Personally I would not be surprised if it is possible to train the same dataset for half the cost 12 months from now.
评论 #37391043 未加载
评论 #37391515 未加载
评论 #37392582 未加载
评论 #37390452 未加载
评论 #37391700 未加载
评论 #37391017 未加载
FanaHOVAover 1 year ago
The fact that the GPUs quantity dropdown cannot go over 1,000 drives home the &quot;GPU poor&quot; point from the SemiAnalysis post. Meta alone has 16,000 GPUs. OpenAI&#x27;s cluster from 2020 had 10,000 GPUs. If you&#x27;re serious about foundation models development and research, you have to go work at one of these &quot;GPU rich&quot; companies.
评论 #37392700 未加载
Dowwieover 1 year ago
Given how expensive it is to train, my impression is that the world in 2023 generally cannot afford to experiment with custom trained models and only well-funded organizations can within a range of acceptability. The risk of spending $20MM on training a large model that doesn&#x27;t produce the desired outcome is going to blow back far worse than engineering failing to deliver features on time. How are teams&#x2F;orgs approaching model training risk management, as in managing the risk that a model fails to deliver after spending 20 Million on training?<p>Next thoughts are how to &quot;SETI model training&quot;, distributing compute to idle resources around the world.
评论 #37393222 未加载
zoogenyover 1 year ago
I think what I really want is turn-key fine tuning for existing foundational models. But honestly, even that is probably 2 years away before it is really a viable business. We lack sufficiently vetted commercial license foundational models. We lack sufficiently available and moderated diverse datasets for fine-tuning. We probably lack sufficient businesses to take the early adopter risk.<p>I&#x27;m planning an all-in strategy with AI but I believe the next 2 years will be lean. Hopefully by then the price for fine-tuning will have come down enough for medium sized businesses outside of the early adopter niche to give it a try. We&#x27;ll have a couple of rounds of failures and successes so most people will have a decent roadmap to building successful products (and avoiding complete failures). We should also have a significant ecosystem of options in both OSS and commercial variations.<p>I feel like this is equivalent to the Internet in 1998. We&#x27;re looking at the Yahoo&#x27;s, the AOLs, and the Pets.com crop of businesses. But things won&#x27;t really heat up for a while. Still plenty of time to grow into this space.
version_fiveover 1 year ago
The other day when they announced more funding there was some speculation here about how they would make money, with someone suggesting it&#x27;s by driving users to cloud gpu platforms (aws, azure). This support that, and it suggest where they will end up, i.e. as a front end for azure.<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37250647">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37250647</a>
YetAnotherNickover 1 year ago
They should focus more on finetuning I think. Finetuning is almost always better than pretraining, even if the pretraining dataset is very different than finetuning dataset. If I could train 30b model for $10 for few tens of million of tokens(basically proportional to current rate), I will definitely use it.
评论 #37391488 未加载
jstx1over 1 year ago
&gt; Train your LLM at scale on our infrastructure<p>Is it really their infrastructure or are they using a cloud provider and this wraps it up and provides convenience for a price?
评论 #37391486 未加载
评论 #37390750 未加载
评论 #37390514 未加载
techterrierover 1 year ago
lowest price from the dropdowns...$43k
评论 #37390892 未加载
评论 #37391628 未加载
alekseiprokopevover 1 year ago
What models would you train if you had the money for various price points?
GaggiXover 1 year ago
I wonder what&#x27;s the multimodal model, Flamingo?
评论 #37392250 未加载
nailloover 1 year ago
The lockin attempts begin