TechEcho

11 comments

At the moment of writing the cost estimate for 70B multimodal model with 7T tokens on 1000 H100 GPUs is $18,461,354 with 184 days of training time.Anyone willing to share an estimate how cost will come down each year as hardware keeps improving and possible new methodologies are found?Personally I would not be surprised if it is possible to train the same dataset for half the cost 12 months from now.

评论 #37391043 未加载

评论 #37391515 未加载

评论 #37392582 未加载

评论 #37390452 未加载

评论 #37391700 未加载

评论 #37391017 未加载

FanaHOVAover 1 year ago

The fact that the GPUs quantity dropdown cannot go over 1,000 drives home the "GPU poor" point from the SemiAnalysis post. Meta alone has 16,000 GPUs. OpenAI's cluster from 2020 had 10,000 GPUs. If you're serious about foundation models development and research, you have to go work at one of these "GPU rich" companies.

评论 #37392700 未加载

Dowwieover 1 year ago

Given how expensive it is to train, my impression is that the world in 2023 generally cannot afford to experiment with custom trained models and only well-funded organizations can within a range of acceptability. The risk of spending $20MM on training a large model that doesn't produce the desired outcome is going to blow back far worse than engineering failing to deliver features on time. How are teams/orgs approaching model training risk management, as in managing the risk that a model fails to deliver after spending 20 Million on training?Next thoughts are how to "SETI model training", distributing compute to idle resources around the world.

评论 #37393222 未加载

zoogenyover 1 year ago

I think what I really want is turn-key fine tuning for existing foundational models. But honestly, even that is probably 2 years away before it is really a viable business. We lack sufficiently vetted commercial license foundational models. We lack sufficiently available and moderated diverse datasets for fine-tuning. We probably lack sufficient businesses to take the early adopter risk.I'm planning an all-in strategy with AI but I believe the next 2 years will be lean. Hopefully by then the price for fine-tuning will have come down enough for medium sized businesses outside of the early adopter niche to give it a try. We'll have a couple of rounds of failures and successes so most people will have a decent roadmap to building successful products (and avoiding complete failures). We should also have a significant ecosystem of options in both OSS and commercial variations.I feel like this is equivalent to the Internet in 1998. We're looking at the Yahoo's, the AOLs, and the Pets.com crop of businesses. But things won't really heat up for a while. Still plenty of time to grow into this space.

version_fiveover 1 year ago

The other day when they announced more funding there was some speculation here about how they would make money, with someone suggesting it's by driving users to cloud gpu platforms (aws, azure). This support that, and it suggest where they will end up, i.e. as a front end for azure.<a href="https://news.ycombinator.com/item?id=37250647">https://news.ycombinator.com/item?id=37250647</a>

YetAnotherNickover 1 year ago

They should focus more on finetuning I think. Finetuning is almost always better than pretraining, even if the pretraining dataset is very different than finetuning dataset. If I could train 30b model for $10 for few tens of million of tokens(basically proportional to current rate), I will definitely use it.

评论 #37391488 未加载

jstx1over 1 year ago

> Train your LLM at scale on our infrastructureIs it really their infrastructure or are they using a cloud provider and this wraps it up and provides convenience for a price?

评论 #37391486 未加载

评论 #37390750 未加载

评论 #37390514 未加载

techterrierover 1 year ago

lowest price from the dropdowns...$43k

评论 #37390892 未加载

评论 #37391628 未加载

alekseiprokopevover 1 year ago

What models would you train if you had the money for various price points?

GaggiXover 1 year ago

I wonder what's the multimodal model, Flamingo?

评论 #37392250 未加载

nailloover 1 year ago

The lockin attempts begin

11 comments

TechTechTechover 1 year ago

评论 #37391043 未加载

评论 #37391515 未加载

评论 #37392582 未加载

评论 #37390452 未加载

评论 #37391700 未加载

评论 #37391017 未加载

FanaHOVAover 1 year ago

评论 #37392700 未加载

Dowwieover 1 year ago

评论 #37393222 未加载

zoogenyover 1 year ago

version_fiveover 1 year ago

YetAnotherNickover 1 year ago

评论 #37391488 未加载

jstx1over 1 year ago

> Train your LLM at scale on our infrastructureIs it really their infrastructure or are they using a cloud provider and this wraps it up and provides convenience for a price?

评论 #37391486 未加载

评论 #37390750 未加载

评论 #37390514 未加载

techterrierover 1 year ago

lowest price from the dropdowns...$43k

评论 #37390892 未加载

评论 #37391628 未加载

alekseiprokopevover 1 year ago

What models would you train if you had the money for various price points?

GaggiXover 1 year ago

I wonder what's the multimodal model, Flamingo?

评论 #37392250 未加载

nailloover 1 year ago

The lockin attempts begin

HuggingFace Training Cluster as a Service

11 comments

HuggingFace Training Cluster as a Service

11 comments