At the moment of writing the cost estimate for 70B multimodal model with 7T tokens on 1000 H100 GPUs is $18,461,354 with 184 days of training time.<p>Anyone willing to share an estimate how cost will come down each year as hardware keeps improving and possible new methodologies are found?<p>Personally I would not be surprised if it is possible to train the same dataset for half the cost 12 months from now.
The fact that the GPUs quantity dropdown cannot go over 1,000 drives home the "GPU poor" point from the SemiAnalysis post. Meta alone has 16,000 GPUs. OpenAI's cluster from 2020 had 10,000 GPUs. If you're serious about foundation models development and research, you have to go work at one of these "GPU rich" companies.
Given how expensive it is to train, my impression is that the world in 2023 generally cannot afford to experiment with custom trained models and only well-funded organizations can within a range of acceptability. The risk of spending $20MM on training a large model that doesn't produce the desired outcome is going to blow back far worse than engineering failing to deliver features on time. How are teams/orgs approaching model training risk management, as in managing the risk that a model fails to deliver after spending 20 Million on training?<p>Next thoughts are how to "SETI model training", distributing compute to idle resources around the world.
I think what I really want is turn-key fine tuning for existing foundational models. But honestly, even that is probably 2 years away before it is really a viable business. We lack sufficiently vetted commercial license foundational models. We lack sufficiently available and moderated diverse datasets for fine-tuning. We probably lack sufficient businesses to take the early adopter risk.<p>I'm planning an all-in strategy with AI but I believe the next 2 years will be lean. Hopefully by then the price for fine-tuning will have come down enough for medium sized businesses outside of the early adopter niche to give it a try. We'll have a couple of rounds of failures and successes so most people will have a decent roadmap to building successful products (and avoiding complete failures). We should also have a significant ecosystem of options in both OSS and commercial variations.<p>I feel like this is equivalent to the Internet in 1998. We're looking at the Yahoo's, the AOLs, and the Pets.com crop of businesses. But things won't really heat up for a while. Still plenty of time to grow into this space.
The other day when they announced more funding there was some speculation here about how they would make money, with someone suggesting it's by driving users to cloud gpu platforms (aws, azure). This support that, and it suggest where they will end up, i.e. as a front end for azure.<p><a href="https://news.ycombinator.com/item?id=37250647">https://news.ycombinator.com/item?id=37250647</a>
They should focus more on finetuning I think. Finetuning is almost always better than pretraining, even if the pretraining dataset is very different than finetuning dataset. If I could train 30b model for $10 for few tens of million of tokens(basically proportional to current rate), I will definitely use it.
> Train your LLM at scale on our infrastructure<p>Is it really their infrastructure or are they using a cloud provider and this wraps it up and provides convenience for a price?