TechEcho

4 comments

malux85over 1 year ago

I don’t train LLMs from scratch, but I have:3x4090s 1xTesla A100Lots of fine tuning, attention visualisation, evaluation of embeddings and different embedding generation methods, not just LLMs though I use them a lot for deep nets of many kindsBoth for my day job (hedge fund) and my hobby project <a href="https://atomictessellator.com" rel="nofollow">https://atomictessellator.com</a>It’s summer here in NZ and I have these in servers mounted in a freestanding server rack beside my desk, and it is very hot in here XD

评论 #39030144 未加载

评论 #39030755 未加载

rgbrgbover 1 year ago

Some people have been fine-tuning mistral 7B and phi-2 on their high-end macs. Unified memory is a hell of a thing. The resulting model here is not spectacular but as a proof of concept it's pretty exciting what you get in 3.5 hours on a consumer machine.- Apple M2 Max 64GB shared RAM- Apple Metal (GPU), 8 threads- 1152 iterations (3 epochs), batch size 6, trained over 3 hours 24 minutes<a href="https://www.reddit.com/r/LocalLLaMA/comments/18ujt0n/using_gpus_on_a_mac_m2_max_via_mlx_update_on/" rel="nofollow">https://www.reddit.com/r/LocalLLaMA/comments/18ujt0n/using_g...</a>

评论 #39035258 未加载

评论 #39030259 未加载

评论 #39030314 未加载

评论 #39041837 未加载

buildbotover 1 year ago

A self built machine with dual 4090s, soon to be 3x. Watercooled for quieter operation.Did the math on how much using runpod per day would be, and bought this setup instead.Using Fully sharded data parallel and bfloat16, I can train a 7b param model very slowly. But that’s fine for only going 2000 steps!

评论 #39030633 未加载

bearjawsover 1 year ago

I doubt many people are using local setups for serious work.Even fine tuning Mixtral is 4xH100 for 4 days. Which is a ~$200k server currently.To fully train, not just fine tune a small model, say Llama 2 7b you need over 128GiB of vram, so still multiple GPU territory, likely A100s or H100s.This is all dependent upon the settings you use, increase the batch size and you will see even more memory utilization.I believe a lot of people see these models running locally and assume training is similar, but it isn't.

Ask HN: What local machines are people using to train LLMs?

4 comments

Ask HN: What local machines are people using to train LLMs?

4 comments