Which GPU(s) to Get for Deep Learning

223 点作者 snow_mac将近 2 年前

24 条评论

roenxi将近 2 年前

Evaluating AMD GPU by their specs is not going to paint the full picture. Their drivers are a serious problem. I've managed to get ROCm mostly working on my system (ignoring all the notifications of what is officially supported, the jammy debs from the official repo seem to work on Debian testing). The range of supported setups is limited so it is quite easy to end up in a similar situation.I expect system lockups when doing any sort of model inference. From the experiences of the last few years I assume it is driver bugs. Based on their rate of improvement they probably will get there in around 2025, but their past performance has been so bad I wouldn't recommend buying a card for machine learning until they've proven that they're taking the situation seriously.Although in my opinion buy AMD anyway if you need a GPU on linux. Their open source drivers are a lot less hassle as long as you don't need BLAS.

评论 #36875858 未加载

评论 #36875182 未加载

评论 #36876010 未加载

评论 #36887565 未加载

评论 #36881216 未加载

评论 #36875463 未加载

ItsBob将近 2 年前

Just as an FYI/additional data point, I bought a 3090 FE from Ebay a few months ago for £605 including delivery.I've only just started using it for Llama running locally on my computer at home and I have to say... colour me impressed.It generates the output slightly faster than reading speed so for me it works perfectly well.The 24GB of VRAM should keep it relevant for a bit too and I can always buy another and NVLink them should the need arise.

评论 #36875357 未加载

评论 #36875870 未加载

评论 #36880275 未加载

评论 #36879604 未加载

Tepix将近 2 年前

I used Tim's guide to build a dual RTX 3090 PC, paying 2300€ in total by getting used components. It can run inference of Llama-65B 4bit quantized at more than 10tok/s.Specs: 2x RTX 3090, NVLink Bridge, 128GB DDR4 3200 RAM, Ryzen 7 3700X, X570 SLI mainboard, 2TB M.2 NVMe SSD, air cooled mesh case.Finding the 3-slot nvlink bridge is hard and it's usually expensive. I think it's not worth it in most cases. I managed to find a cheap used one. Cooling is also a challenge. The cards are 2.7 slots wide and the spacing is usually 3 slots, so there isn't much room. Some people are putting 3d printed shrouds on the back of the PC case to suck the air out of the cards with an extra external fan. Also limiting the power from 350W to 280W or so per card doesn't cost a lot of performance. The CPU is not limiting the performance at all, as long as you have 4 cores per GPU you're good.

评论 #36881781 未加载

评论 #36880687 未加载

andy_ppp将近 2 年前

I hear a lot about CUDA and how bad ROCm is etc. and I’ve been trying to understand what exactly CUDA is doing that is so special; isn’t the maths for neural networks mostly multiplying large arrays/tensors together? What magic is CUDA doing that is so different for other vendors to implement? Is it just lock-in, the type of operations that are available, some kind of magical performance advantage or something else that CUDA is doing?

评论 #36878194 未加载

评论 #36878561 未加载

评论 #36878753 未加载

评论 #36879095 未加载

评论 #36878923 未加载

nl将近 2 年前

You can tell how NVIDIA dominants the market by the fact their price/performance "curve" is almost a straight line.In a competitive market that line has distortions where one player trts to undercut the other.There are no bargains because there is almost no competitive pressure and so there is barely any distortion in that line.

评论 #36875387 未加载

politelemon将近 2 年前

So Nvidia is going to pretty much corner the market for a long time? This bit I expected but was still sad to read. Surely we would benefit from competition. It would probably take a lot of investment from AMD to make that happen, I imagine.> AMD GPUs are great in terms of pure silicon: Great FP16 performance, great memory bandwidth. However, their lack of Tensor Cores or the equivalent makes their deep learning performance poor compared to NVIDIA GPUs. Packed low-precision math does not cut it. Without this hardware feature, AMD GPUs will never be competitive.Edit: what about Intel arc GPU? Any hope there?

评论 #36875219 未加载

评论 #36875079 未加载

评论 #36878138 未加载

fnands将近 2 年前

App based on this post to help you decide what to buy: <a href="https://nanx.me/gpu/" rel="nofollow noreferrer">https://nanx.me/gpu/</a>

评论 #36880084 未加载

frognumber将近 2 年前

I think there's one more axis: Frequency-of-use.For occasionally use, the major constraint isn't speed so much as which models fit. I tend to look at $/GB VRAM as my major spec. Something like a 3060 12GB is an outlier for fitting sensible models while being cheap.I don't mind waiting a minute instead of 15 seconds for some complex inference if I do it a few times per day. Or having training be slower if it comes up once every few months.

评论 #36881419 未加载

PeterStuer将近 2 年前

I'm sticking with nVidia for now (currently a 3090 bought secondhand of eBay) as it is the most tested/supported by far, but it is great to see AMD making progress (finally) as some competition in this segment is desperatly needed.

评论 #36879324 未加载

savandriy将近 2 年前

I've bought a Radeon RX 6700XT (12GB) last year, primarily for playing games.But after Stable Diffusion came out, I started to play around with it and was pleasantly surprised that the GPU could handle it!The setup is a little messy, and Linux only.For someone targeting AI, definitely pick an Nvidia card with 12+ GBs of VRAM.

reducesuffering将近 2 年前

You'll want lots of memory, so depends on your price point.4090 ($1,600) > 3090 ($1300 new - $600 used) > 3060 ($300)used 3090 is ideal value. Lots of models will need the 24gb ram

pizza将近 2 年前

Trying to build a scalable home 4090 cluster but running into a lot of confusion...Let's say- I have a motherboard + cpu + other components and they've both got plenty of pcie lanes to spare, total this part draws 250W (incl the 25% extra wattage headroom)- start off with one RTX 4090, TDP 450W, with headroom ~600W.- I want to scale up by adding more 4090s over time, as many as my pcie lanes can support.<pre><code> 1. How do I add more PSUs over time? 2. Recommended initial PSU wattage? Recommended wattage for each additional pair of 4090s? 3. Recommended PSU brands and models for my use case? 4. Is it better to use PCI gen5 spec-rated PSUs? ATX 3.0? 12vhpwr cables rather than the ordinary 8-pin cables? I've also read somewhere that power cables between different brands of PSUs are *not* interchangeable?? 5. Whenever I add an additional PSU, do I need to do something special to electrically isolate the PCIe slots? 6. North American outlets are rated for ~15A * 120V. So roughly 1800W. I can just use one outlet per psu whenever it's under 1800W, right? For simplicity let's also ignore whatever load is on that particular electrical circuit. </code></pre> Each GPU means another 600W. Let's say I want to add another PSU for every 2 4090s. I understand that to sync the bootup of multiple PSUs you need an add2psu adapter.I understand the motherboard can provide ~75W for a pcie slot. I take it that the rest comes from the psu power cables. I've seen conflicting advice online - apparently miners use pcie x1 electrically isolated risers for additional power supplies, but also I've seen that it's fine as long as every input power cable for 1 gpu just comes from one psu, regardless of whether it's the one that powers the motherboard. Either way x1 risers is an unattractive option bc of bandwidth limitations.pls help

评论 #36881284 未加载

评论 #36879237 未加载

评论 #36880422 未加载

评论 #36880385 未加载

paul_funyun将近 2 年前

One, don't use a case. Look at how miners mounted their hardware on racks and take notes. Cheaper, better for temps, and the most efficient use of space.Two, I recommend ignoring electricity cost and using all you can. If it's cheaper now than it ever will be, use it while it's cheap. If it will go down due to renewables, nuclear, etc in the future, it's good to buy up the GPUs while their price is artificially depressed from energy fears.Third, go for server type PSUs and breakout boards. The server PSUs cant be beaten in watts for your dollar, and are extremely efficient.Finally, consider scooping up some x79 and x99 xeon boards from Chinese sellers. They're cheap as hell, have PCI lanes out the wazoo, etc. This means you don't have to fool with as many mobos to run the same amount of gpus. If you go this route, don't get the bottom of the barrel no-name motherboards. Machinist is a decent one.

andrewstuart将近 2 年前

There’s clearly demand to buy AI capable GPUs at the store at a low price.But Nvidias monopoly mean a they cripple their retail cards and push the AI stuff to data centers.If only there was many manufacturers of AI hardware and software there would be abundant cheap products at every level.AMD and Intel don’t seem to be able to compete and there’s no sign that will change.So AI is going to remain expensive and hard to get for a very long time.

graton将近 2 年前

I almost immediately became suspicious on the accuracy of this article when they said the "Nvidia RTX 40 Ampere series". Ampere was the architecture name for the RTX 30 series. Ada Lovelace is the architecture name for the RTX 40 series.

评论 #36879310 未加载

jcuenod将近 2 年前

Any advice for mobile gpus? I'm interested in getting a laptop (preferably in the portable category). Obviously it's not going to be in 4090 territory, that's a tradeoff I'm willing to make.

adultSwim将近 2 年前

Weird to leave out Apple. They seem to be the cheapest option to get a large amount of GPU memory.

synergy20将近 2 年前

4090 is now in high end PCs, with 24GB VRAM, that's what I'm going to buy.Everyone talks about Nvidia GPUs and AMD MI250/MI300, where is Intel? Would love to have a 3rd player.

评论 #36878292 未加载

评论 #36877559 未加载

justinclift将近 2 年前

Raw performance rating for the RTX 3070 seems very weirdly placed in the chart. It's below the RTX 3060 Ti, which doesn't seem to make any sense.

lyapunova将近 2 年前

I never tire of this. Tim is a wonderful no nonsense person. I love these posts and I love that it stays up to date.

arvinsim将近 2 年前

Really a shame that the 4070ti doesn't have 16GB.But I guessed it is expected that Nvidia doesn't want to cannibalize the 4080.

评论 #36874864 未加载

评论 #36875336 未加载

kristianp将近 2 年前

For a compromise, how is the recently released 4060ti with 16gb RAM? Its about a third the price of a 4090.

xnx将近 2 年前

Do local GPUs make sense? For the same price, can't you got a full years worth of cloud gpu time?

评论 #36878808 未加载

评论 #36877300 未加载

评论 #36878356 未加载

32gbsd将近 2 年前

Omg that's a long read but very informative