TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Training Stable Diffusion from Scratch Costs <$160k

98 pointsby moinnadeemover 2 years ago

16 comments

mullingitoverover 2 years ago
Interesting to think about where the cost will go in a few years.<p>I remember in college intro to CS class back in 1998, where I heard the story of building the first computer that could perform at 1 TFLOPS[1]. It cost $46 million and took up 1600 square feet. Now a $600 Mac Mini will do double that.<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;ASCI_Red" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;ASCI_Red</a>
评论 #34525916 未加载
评论 #34526418 未加载
wokwokwokover 2 years ago
Is this just an ad for a service?<p>They didn’t make anything.<p>This is just speculative benchmarking.<p>I am deeply not interested in multiplying the numbers on your pricing sheet by the estimated numbers on the stable diffusion model card.<p>I have zero interest in your (certainly excellent) Proprietary Special Sauce (TM) that makes spending money on your service a good idea.<p>This just reads as spam that got past the spam filter.<p>Did you <i>actually</i> train a diffusion model?<p>Are you going to release the model file?<p>Where is the <i>actual code</i> someone could use to replicate your results?<p>Given the lack of example outputs, I guess not.
评论 #34526508 未加载
abeppuover 2 years ago
&gt; *256 A100 throughput was extrapolated using the other throughput measurements.<p>It seems worth noting that the $160k scenario wasn&#x27;t actually measured.
评论 #34526099 未加载
epicycles33over 2 years ago
Glad to see this - you can even get reasonable-ish results on lower res images with ~2 hours train time on a P100 GPU. See my try here: <a href="https:&#x2F;&#x2F;www.kaggle.com&#x2F;code&#x2F;apapiu&#x2F;train-latent-diffusion-in-keras-from-scratch" rel="nofollow">https:&#x2F;&#x2F;www.kaggle.com&#x2F;code&#x2F;apapiu&#x2F;train-latent-diffusion-in...</a>
gedyover 2 years ago
Still pretty pricey for average person, but these will trend cheaper and why I think it&#x27;s futile to &quot;regulate&quot; AI. Someone somewhere will train models on anything visible to public, licensed or not. Feels like Pandora&#x27;s box has been opened and we need to deal with it.
评论 #34525363 未加载
评论 #34525751 未加载
pizzaover 2 years ago
5 bucks says within a year there’ll be some innovation that shrinks this by 2 orders of magnitude. Either from much cheaper compute cost (eg OPUs) or much more efficient training. Hell, there ought to be some way to leapfrog these innovations in such a way that the huge model of yesteryear becomes a more powerful optimizer&#x2F;loss function itself. That’d just about solve the “hands off my unique shapes!” problem of acceptable training data trawling too :)
odyssey7over 2 years ago
How many tries does it take for an expert to succeed at training a custom Stable Diffusion?
ipsum2over 2 years ago
Note that this doesn&#x27;t take into account the numerous iterations required to dial in the correct hyperparameters and model architecture, which could easily increase cost 5-10x.<p>&gt; 256 A100 throughput was extrapolated using the other throughput measurements<p>Is it an indictment of their service that they couldn&#x27;t afford 256 GPUs on their own cloud?
评论 #34526213 未加载
choxiover 2 years ago
Data truly is the new oil. When it’s all done the compute costs and code will be cheap or free. There’s a lot hinging on how we interpret copyright laws or what kind of data rights laws we enact.
评论 #34526316 未加载
评论 #34526093 未加载
rektideover 2 years ago
This task requires a bit more work than I&#x27;d want, but I&#x27;d also point out $100k can buy ~9 A100&#x27;s which are good for ~7k hours of work a month (through not entirely reputable channels, so there&#x27;s a chance some might die earlier or might have to be returned). That might not train Stable Diffusion in a fast enough time for you (~50k hours estimated training time), but it&#x27;s still damned impressive. And you can keep the hardware.<p>I wonder if AMD is as over-the-top brutal with legal control over where their GPUs can be used as Nvidia is. Maybe with energy cost you might possibly still want to stick with the A100&#x27;s anyways, but you can afford quite a lot of RX 7900&#x27;s with $100k (if you can find em).
xnxover 2 years ago
It&#x27;s interesting to compare the cost for cloud GPU&#x27;s vs. buying the hardware outright. At ~$10,000 per Nvidia A100 GPU, it seems like this cloud provider would break even on the hardware after about 5 months at these rates. There are certainly other costs involved (racking, power, etc.), but that&#x27;s not too bad. I&#x27;m almost surprised Nvidia doesn&#x27;t cannibalize it&#x27;s hardware sales by running its own cloud.
coding123over 2 years ago
There are some large AWS customers that probably burn that in idle time on a bunch of unused machines per week (probably day).
评论 #34526051 未加载
mensetmanusmanover 2 years ago
This ignores all the runtime costs for LLMs that aren’t operating effectively :)
marcoolivover 2 years ago
Is there any value of this cost that we can say &quot;this is dangerous&quot;. For any reason?
评论 #34525453 未加载
xwdvover 2 years ago
We can do it for way less using spot instances on AWS, though it takes longer.
评论 #34526068 未加载
grapheover 2 years ago
I&#x27;ve downloaded anime models for free. I&#x27;m sure they were &lt;$160 without the k. <a href="https:&#x2F;&#x2F;github.com&#x2F;Noah670&#x2F;stablediffusionAnime">https:&#x2F;&#x2F;github.com&#x2F;Noah670&#x2F;stablediffusionAnime</a>
评论 #34525691 未加载