Hello everyone,<p>I’m a math graduate student in Germany and recently I’m interested in developing local and/or web apps with LLMs. I have a 12-year-old MacBook Pro, so I’m thinking about buying something new.<p>I have searched relevant keywords here, and the “universal suggestions” seem to be that one should use laptops to access GPUs on the cloud, instead of running training and/or inferences on a laptop.<p>Someone mentioned [ASUS ROG G16](https://www.amazon.de/Anti-Glare-Display-i9-13980HX-Windows-Keyboard/dp/B0BZTJKZ5L/) or G14/15 can be a good local setup for running small models. While I probably can afford this, but it’s still slightly more expensive than I thought.<p>Given that a 3060 is around 300 euros, I was wondering that would a cheaper solution would be to build a PC myself? If so, how much cost do you think I would spend? I’ll probably move to a new place in the Fall semester, so I would like something portable or not too heavy if possible.<p>Thank you very much for your time!
So running models on a M1 Mac with 32gb+ works very well as the CPU and GPU RAM are shared so you can run some really significant models with with.<p>Earlier this year, I also went down the path of looking into building a machine with dual 3090s. Doing it for <$1,000 is fairly challenging once you add case, motherboard, CPU, RAM, etc.<p>What I ended up doing was getting a used rackmount server that is capable of handling dual GPUs and two nVidia Telsa P40s.<p>Examples:
<a href="https://www.ebay.com/itm/284514545745?itmmeta=01HRJZX097EGBPF60J0VTJJVXY" rel="nofollow">https://www.ebay.com/itm/284514545745?itmmeta=01HRJZX097EGBP...</a>
<a href="https://www.ebay.com/itm/145655400112?itmmeta=01HRJZXK512Y3N26A8N2YDPX2X" rel="nofollow">https://www.ebay.com/itm/145655400112?itmmeta=01HRJZXK512Y3N...</a><p>The total here was ~$600 and there was essentially no effort building / assembling the machine, except I needed to order some molex power adapters, which were cheap.<p>The server is definitely compact, but it can get LOUD when it's running heavy load, so that might be a consideration.<p>It's probably not the right machine for training models, but it runs inference on GGUF (using ollama) quite well. I have been running Mixtral at zippy token rates and smaller models even faster.