TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: What local machines are people using to train LLMs?

56 pointsby Exorustover 1 year ago
How are people building local rigs to train LLMs?

4 comments

malux85over 1 year ago
I don’t train LLMs from scratch, but I have:<p>3x4090s 1xTesla A100<p>Lots of fine tuning, attention visualisation, evaluation of embeddings and different embedding generation methods, not just LLMs though I use them a lot for deep nets of many kinds<p>Both for my day job (hedge fund) and my hobby project <a href="https:&#x2F;&#x2F;atomictessellator.com" rel="nofollow">https:&#x2F;&#x2F;atomictessellator.com</a><p>It’s summer here in NZ and I have these in servers mounted in a freestanding server rack beside my desk, and it is very hot in here XD
评论 #39030144 未加载
评论 #39030755 未加载
rgbrgbover 1 year ago
Some people have been fine-tuning mistral 7B and phi-2 on their high-end macs. Unified memory is a hell of a thing. The resulting model here is not spectacular but as a proof of concept it&#x27;s pretty exciting what you get in 3.5 hours on a consumer machine.<p>- Apple M2 Max 64GB shared RAM<p>- Apple Metal (GPU), 8 threads<p>- 1152 iterations (3 epochs), batch size 6, trained over 3 hours 24 minutes<p><a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;LocalLLaMA&#x2F;comments&#x2F;18ujt0n&#x2F;using_gpus_on_a_mac_m2_max_via_mlx_update_on&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;LocalLLaMA&#x2F;comments&#x2F;18ujt0n&#x2F;using_g...</a>
评论 #39035258 未加载
评论 #39030259 未加载
评论 #39030314 未加载
评论 #39041837 未加载
buildbotover 1 year ago
A self built machine with dual 4090s, soon to be 3x. Watercooled for quieter operation.<p>Did the math on how much using runpod per day would be, and bought this setup instead.<p>Using Fully sharded data parallel and bfloat16, I can train a 7b param model very slowly. But that’s fine for only going 2000 steps!
评论 #39030633 未加载
bearjawsover 1 year ago
I doubt many people are using local setups for serious work.<p>Even fine tuning Mixtral is 4xH100 for 4 days. Which is a ~$200k server currently.<p>To fully train, not just fine tune a small model, say Llama 2 7b you need over 128GiB of vram, so still multiple GPU territory, likely A100s or H100s.<p>This is all dependent upon the settings you use, increase the batch size and you will see even more memory utilization.<p>I believe a lot of people see these models running locally and assume training is similar, but it isn&#x27;t.
评论 #39030427 未加载