科技回声

13 条评论

Tepix3 个月前

It used to be that for locally running GenAI, VRAM per dollar was king, so used NVidia RTX 3090 cards were the undisputed darlings of DYI LLM with 24GB for 600€-800€ or so. Sticking two of these in one PC isn't too difficult despite them using 350W each.Then Apple introduced Macs with 128 GB and more unified memory at 800GB/s and the ability to load models as large as 70GB (70b FP8) or even larger ones. The M1 Ultra was unable to take full advantage of the excellent RAM speed, but with the M2 and the M3, performance is improving. Just be prepared to spend 5000€ or more for a M3 Ultra. Another alternative would be a EPYC 9005 system with 12x DDR5-6000 RAM for 576GB/s of memory bandwidth with the LLM (preferably MoE) running on the CPU instead of a GPU.However today, with the latest, surprisingly good reasoning models like QwQ-32B using up thousands or tens of thousands of tokens in their replies, performance is getting more important than previously and these systems (Macs and even RTX 3090s) might fall out of favor, because waiting for a finished reply will take several minutes or even tens of minutes. Nvidia Ampere and Apple silicon (AFAIK) are also missing FP4 support in hardware, which doesn't help.For the same reason AMD Halo Strix with a mere 273GB/s of RAM bandwidth and perhaps also NVidia Project Digits (also speculated to offer similar RAM bandwidth) might just be too slow for reasoning models with more than 50GB or so of active parameters.On the other hand, if the prices for the RTX 5090 remain at 3500€, they will likely remain insignificant for the DIY crowd for that reason alone.Perhaps AMD will take the crown with a variant of their RDNA4 RX 9070 card with 32GB of VRAM priced at around 1000€? Probably wishful thinking…

评论 #43318459 未加载

评论 #43318469 未加载

评论 #43319323 未加载

评论 #43321601 未加载

评论 #43318821 未加载

评论 #43317812 未加载

benob3 个月前

The benchmark only touches 8B-class models at 8-bit quantification. Would be interesting to see how it fares with models that use more of the card ram, and under varying quantization and context lengths.

评论 #43318348 未加载

评论 #43317967 未加载

KronisLV3 个月前

Sometimes I daydream about a world where GPUs just have the equivalent of LPCAMM and you could put in as much RAM as you can afford and as much as the hardware supports, much like is the case with motherboards, even if something along the way would bottleneck somewhat. It'd really extend the life of some hardware, yet companies don't want that.That said, it's cool that you can even get an L4 with 24 GB of VRAM that actually performs okay, yet is passively cooled and consumes like 70W, at that point you can throw a bunch of them into a chassis and if you haven't bankrupted yourself by then, they're pretty good.I did try them out on Scaleway, the pricing isn't even that exorbitant, using consumer GPUs for LLM use cases doesn't quite hit the same since.

评论 #43319918 未加载

评论 #43319343 未加载

评论 #43319409 未加载

3np3 个月前

Correct me if I'm wrong, but I have the impression that we'd usually expect to see bigger efficiency gains while these are marginal?If so that would confirm the notion that they've hit a ceiling and pushing against physical limitations.

评论 #43318292 未加载

wewewedxfgdf3 个月前

It would be very interesting to see these alongside benchmarks for Apple M4, AMD Halo Strix and other AMD cards.

评论 #43317750 未加载

评论 #43324863 未加载

评论 #43317640 未加载

jacekm3 个月前

If my budget is only ~$1000 should I buy a used 3090 or a new 5080? (for AI, I don't care about gaming)

评论 #43320794 未加载

评论 #43320740 未加载

评论 #43320767 未加载

chakintosh3 个月前

Curious if the author checked whether his card doesn't have any missing ROPs.

评论 #43318637 未加载

cjtrowbridge3 个月前

P40s are so slept on. 24gb vram for $150.

评论 #43322225 未加载

评论 #43319414 未加载

评论 #43321098 未加载

benw2143 个月前

96gb beast coming from nvidia.<a href="https://videocardz.com/newz/nvidia-rtx-pro-6000-blackwell-leaked-24064-cores-96gb-g7-memory-and-600w-double-flow-through-cooler" rel="nofollow">https://videocardz.com/newz/nvidia-rtx-pro-6000-blackwell-le...</a>

alecco3 个月前

Is llama.cpp's CUDA implementation decent? (e.g. does it use CUTLASS properly or something more low level)

评论 #43318530 未加载

3np3 个月前

Great to see Mr Larabel@Phoronix both maintaining consistent legit reporting and still have time for one-offs like this in these times of AI slop and other OG writers either quitting or succumbing to the vortex. Hats off!

DrNosferatu3 个月前

And the latest AMD cards for reference?Also, some DeepSeek models would be cool.

评论 #43320532 未加载

littlestymaar3 个月前

TL;DR; performance isn't bad, but perf per Watt isn't better than 4080 or 4090 and can even be significantly lower than 4090 in certain contexts.

评论 #43318650 未加载

评论 #43321308 未加载

13 条评论

Tepix3 个月前

评论 #43318459 未加载

评论 #43318469 未加载

评论 #43319323 未加载

评论 #43321601 未加载

评论 #43318821 未加载

评论 #43317812 未加载

benob3 个月前

评论 #43318348 未加载

评论 #43317967 未加载

KronisLV3 个月前

评论 #43319918 未加载

评论 #43319343 未加载

评论 #43319409 未加载

3np3 个月前

评论 #43318292 未加载

wewewedxfgdf3 个月前

It would be very interesting to see these alongside benchmarks for Apple M4, AMD Halo Strix and other AMD cards.

评论 #43317750 未加载

评论 #43324863 未加载

评论 #43317640 未加载

jacekm3 个月前

If my budget is only ~$1000 should I buy a used 3090 or a new 5080? (for AI, I don't care about gaming)

评论 #43320794 未加载

评论 #43320740 未加载

评论 #43320767 未加载

chakintosh3 个月前

Curious if the author checked whether his card doesn't have any missing ROPs.

评论 #43318637 未加载

cjtrowbridge3 个月前

P40s are so slept on. 24gb vram for $150.

评论 #43322225 未加载

评论 #43319414 未加载

评论 #43321098 未加载

benw2143 个月前

alecco3 个月前

Is llama.cpp's CUDA implementation decent? (e.g. does it use CUTLASS properly or something more low level)

评论 #43318530 未加载

3np3 个月前

DrNosferatu3 个月前

And the latest AMD cards for reference?Also, some DeepSeek models would be cool.

评论 #43320532 未加载

littlestymaar3 个月前

TL;DR; performance isn't bad, but perf per Watt isn't better than 4080 or 4090 and can even be significantly lower than 4090 in certain contexts.

评论 #43318650 未加载

评论 #43321308 未加载

Llama.cpp AI Performance with the GeForce RTX 5090 Review

13 条评论

Llama.cpp AI Performance with the GeForce RTX 5090 Review

13 条评论