TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: San Francisco Compute – 512 H100s at <$2/hr for research and startups

727 点作者 flaque将近 2 年前
Hey folks! We&#x27;re Alex and Evan, and we&#x27;re working on putting together a 512 H100 compute cluster for startups and researchers to train large generative models on. - it runs at the lowest possible margins (&lt;$2.00&#x2F;hr per H100) - designed for bursty training runs, so you can take say 128 H100s for a week - you don’t need to commit to multiple years of compute or pay for a year upfront<p>Big labs like OpenAI and Deepmind have big clusters that support this kind of bursty allocation for their researchers, but startups so far have had to get very small clusters on very long term contracts, wait months of lead time, and try to keep them busy all the time.<p>Our goal is to make it about 10-20x cheaper to do an AI startup than it is right now. Stable Diffusion only costs about $100k to train -- in theory every YC company could get up to that scale. It&#x27;s just that no cloud provider in the world will give you $100k of compute for just a couple weeks, so startups have to raise 20x that much to buy a whole year of compute.<p>Once the cluster is online, we&#x27;re going to be pretty much the only option for startups to do big training runs like that on.

25 条评论

sillysaurusx将近 2 年前
I hope you succeed. TPU research cloud (TRC) tried this in 2019. It was how I got my start.<p>In 2023 you can barely get a single TPU for more than an hour. Back then you could get literally hundreds, with an s.<p>I believed in TRC. I thought they’d solve it by scaling, and building a whole continent of TPUs. But in the end, TPU time was cut short in favor of internal researchers — some researchers being more equal than others. And how could it be any other way? If I made a proposal today to get these H100s to train GPT to play chess, people would laugh. The world is different now.<p>Your project has a youthful optimism that I hope you won’t lose as you go. And in fact it might be the way to win in the long run. So whenever someone comes knocking, begging for a tiny slice of your H100s for their harebrained idea, I hope you’ll humor them. It’s the only reason I was able to become anybody.
评论 #36935095 未加载
评论 #36937037 未加载
评论 #36936782 未加载
评论 #36937003 未加载
评论 #36934741 未加载
评论 #36939997 未加载
评论 #36934865 未加载
whack将近 2 年前
&gt; <i>Rather than each of K startups individually buying clusters of N gpus, together we buy a cluster with NK gpus... Then we set up a job scheduler to allocate compute</i><p>In theory, this sounds almost identical to the business model behind AWS, Azure, and other cloud providers. &quot;Instead of everyone buying a fixed amount of hardware for individual use, we&#x27;ll buy a massive pool of hardware that people can time-share.&quot; Outside of cloud providers having to mark up prices to give themselves a net-margin, is there something else they are failing to do, hence creating the need for these projects?
评论 #36937749 未加载
评论 #36937020 未加载
评论 #36936524 未加载
bnr4u将近 2 年前
Having hosted infrastructure in CA at multiple colos. I would advise you to host it elsewhere if you can, cost of power, other infrastructure is much higher in CA than AZ or NV.
评论 #36945903 未加载
评论 #36937145 未加载
wodenokoto将近 2 年前
&gt; It&#x27;s just that no cloud provider in the world will give you $100k of compute for just a couple weeks<p>I&#x27;ve never had to buy very large compute, but I thought that was the whole point of the cloud
williamstein将近 2 年前
How does this compare to <a href="https:&#x2F;&#x2F;lambdalabs.com&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;lambdalabs.com&#x2F;</a> ?
评论 #36935032 未加载
评论 #36935023 未加载
评论 #36934984 未加载
评论 #36936351 未加载
评论 #36935053 未加载
whimsicalism将近 2 年前
I am super interested in AI on a personal level and have been involved for a number of years.<p>I have never seen a GPU crunch quite like it is right now. To anyone who is interested in hobbyist ML, I highly highly recommend using vast.ai
评论 #36937779 未加载
评论 #36936413 未加载
评论 #36935919 未加载
dudus将近 2 年前
I know AWS&#x2F;GCP&#x2F;Azure have overhead and I understand why so many companies choose to go bare metal on their ops. I personally rarely think it&#x27;s worth the time and effort, but I get that with scale saving can be substantial.<p>But for AI training? If the public cloud isn&#x27;t competitive even for bursty AI training, their margins are much higher than I anticipated.<p>OP mentions 10-20x cost reduction? Compared to what? AWS?
评论 #36940030 未加载
kaycebasques将近 2 年前
Hi, SF lover [1] here. Anything interesting to note about your name? Will your hardware actually be based in SF? Any plans to start meetups or bring customers together for socializing or anything like that?<p>[1] We have not gone the way of the Xerces blue [2] yet... we still exist!<p>[2] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Xerces_blue" rel="nofollow noreferrer">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Xerces_blue</a>
评论 #36936770 未加载
评论 #36936783 未加载
nilsbunger将近 2 年前
I love the idea of community assets. could it be the start of a GPU co-op?
评论 #36935758 未加载
评论 #36935688 未加载
moneycantbuy将近 2 年前
How did you get the money to buy 512 H100s?
评论 #36935490 未加载
评论 #36937024 未加载
评论 #36935570 未加载
itissid将近 2 年前
Noob Thought: So this would be a blue print on how a mid tier universities with older large compute cluster ops could do things in 2023 to support large LLM research?<p>Perhaps its also a way for freshly applying grad students to look at a university looking to do research in LLMs that requires scale...
评论 #36936894 未加载
latchkey将近 2 年前
554 5.7.1 &lt;evan@sfcompute.org&gt;: Relay access denied<p>554 5.7.1 &lt;alex@sfcompute.org&gt;: Relay access denied
评论 #36934956 未加载
sashank_1509将近 2 年前
Correct me if I’m wrong but doesn’t Lambda Labs already provide them at 1.89$? What’s the point if you’re starting out not the cheapest
评论 #36936517 未加载
评论 #36935670 未加载
mackid将近 2 年前
Nat Friedman and Daniel Gross setup a 2,512 H100 cluster [1] for their startups, with a very similar “shared” model. Might be interesting to connect with them.<p>[1] <a href="https:&#x2F;&#x2F;andromedacluster.com&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;andromedacluster.com&#x2F;</a>
评论 #36938551 未加载
metadat将近 2 年前
Will it be a Slurm cluster, or what kind of scheduler is SFC planning to use?
ucarion将近 2 年前
Wishing y&#x27;all the best of luck. This would be huge for a lot of folks.
PettingRabbits将近 2 年前
What kind of hardware setup are you planning out? Colocation, roll-your-own data center, something in between? Any thoughts on what servers the GPUs will be housed in?
netcraft将近 2 年前
Honest question I don’t know how to consider: are we further along or behind with AI given crypto’s use of GPUs? Has the same cards bought for mining furthered AI, or maybe that demand lead to more research into GPUs and what they can do - or would we be further along if we weren’t wasting these cards on mining?
评论 #36938710 未加载
orGANicWeb将近 2 年前
How are you going to sell access and divide the resources?
resonance1994将近 2 年前
Just curious, do you guys use renewable energy to power your cluster?
rushingcreek将近 2 年前
I love this. Us at Phind.com would love to be a part of this.
29athrowaway将近 2 年前
During a gold rush, sell shovels.<p>When was the last time you spoke to a chatbot?
评论 #36935723 未加载
评论 #36935782 未加载
评论 #36935649 未加载
rsync将近 2 年前
&quot;Once the cluster is online ...&quot;<p>Where will the cluster be hosted ?<p>May I suggest that you get your IP transit from he.net ?
评论 #36935551 未加载
评论 #36934903 未加载
AndrewKemendo将近 2 年前
The billion dollar question is:<p>Who is funding this?<p>Cause if it’s VC then it’s going to have the same fate as everything else after 5-7 years.<p>I hope y’all have as innovative of a business model. You’ll need it if you want to do what you’re doing now for more than a few years
评论 #36937312 未加载
jeepers6将近 2 年前
Please take this question without prejudice.<p>Is it accurate to say you’re willing to go into ~20,000,000 USD debt to sell discounted computer-as-a-service to researchers&#x2F;startups, but unwilling to go into debt to sponsor the undergraduate degrees of ~100-500 students at top-tier schools? (40k - 200k USD per degree)<p>Or, you know, build and fund a small public school&#x2F;library or two for ~5 years?