TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Llama.cpp AI Performance with the GeForce RTX 5090 Review

113 点作者 kristianp3 个月前

13 条评论

Tepix3 个月前
It used to be that for locally running GenAI, VRAM per dollar was king, so used NVidia RTX 3090 cards were the undisputed darlings of DYI LLM with 24GB for 600€-800€ or so. Sticking two of these in one PC isn&#x27;t too difficult despite them using 350W each.<p>Then Apple introduced Macs with 128 GB and more unified memory at 800GB&#x2F;s and the ability to load models as large as 70GB (70b FP8) or even larger ones. The M1 Ultra was unable to take full advantage of the excellent RAM speed, but with the M2 and the M3, performance is improving. Just be prepared to spend 5000€ or more for a M3 Ultra. Another alternative would be a EPYC 9005 system with 12x DDR5-6000 RAM for 576GB&#x2F;s of memory bandwidth with the LLM (preferably MoE) running on the CPU instead of a GPU.<p>However today, with the latest, surprisingly good reasoning models like QwQ-32B using up thousands or tens of thousands of tokens in their replies, performance is getting more important than previously and these systems (Macs and even RTX 3090s) might fall out of favor, because waiting for a finished reply will take several minutes or even tens of minutes. Nvidia Ampere and Apple silicon (AFAIK) are also missing FP4 support in hardware, which doesn&#x27;t help.<p>For the same reason AMD Halo Strix with a mere 273GB&#x2F;s of RAM bandwidth and perhaps also NVidia Project Digits (also speculated to offer similar RAM bandwidth) might just be too slow for reasoning models with more than 50GB or so of active parameters.<p>On the other hand, if the prices for the RTX 5090 remain at 3500€, they will likely remain insignificant for the DIY crowd for that reason alone.<p>Perhaps AMD will take the crown with a variant of their RDNA4 RX 9070 card with 32GB of VRAM priced at around 1000€? Probably wishful thinking…
评论 #43318459 未加载
评论 #43318469 未加载
评论 #43319323 未加载
评论 #43321601 未加载
评论 #43318821 未加载
评论 #43317812 未加载
benob3 个月前
The benchmark only touches 8B-class models at 8-bit quantification. Would be interesting to see how it fares with models that use more of the card ram, and under varying quantization and context lengths.
评论 #43318348 未加载
评论 #43317967 未加载
KronisLV3 个月前
Sometimes I daydream about a world where GPUs just have the equivalent of LPCAMM and you could put in as much RAM as you can afford and as much as the hardware supports, much like is the case with motherboards, even if something along the way would bottleneck somewhat. It&#x27;d really extend the life of some hardware, yet companies don&#x27;t want that.<p>That said, it&#x27;s cool that you can even get an L4 with 24 GB of VRAM that actually performs okay, yet is <i>passively cooled</i> and consumes like 70W, at that point you can throw a bunch of them into a chassis and if you haven&#x27;t bankrupted yourself by then, they&#x27;re pretty good.<p>I did try them out on Scaleway, the pricing isn&#x27;t even that exorbitant, using consumer GPUs for LLM use cases doesn&#x27;t quite hit the same since.
评论 #43319918 未加载
评论 #43319343 未加载
评论 #43319409 未加载
3np3 个月前
Correct me if I&#x27;m wrong, but I have the impression that we&#x27;d usually expect to see bigger efficiency gains while these are marginal?<p>If so that would confirm the notion that they&#x27;ve hit a ceiling and pushing against physical limitations.
评论 #43318292 未加载
wewewedxfgdf3 个月前
It would be very interesting to see these alongside benchmarks for Apple M4, AMD Halo Strix and other AMD cards.
评论 #43317750 未加载
评论 #43324863 未加载
评论 #43317640 未加载
jacekm3 个月前
If my budget is only ~$1000 should I buy a used 3090 or a new 5080? (for AI, I don&#x27;t care about gaming)
评论 #43320794 未加载
评论 #43320740 未加载
评论 #43320767 未加载
chakintosh3 个月前
Curious if the author checked whether his card doesn&#x27;t have any missing ROPs.
评论 #43318637 未加载
cjtrowbridge3 个月前
P40s are so slept on. 24gb vram for $150.
评论 #43322225 未加载
评论 #43319414 未加载
评论 #43321098 未加载
benw2143 个月前
96gb beast coming from nvidia.<p><a href="https:&#x2F;&#x2F;videocardz.com&#x2F;newz&#x2F;nvidia-rtx-pro-6000-blackwell-leaked-24064-cores-96gb-g7-memory-and-600w-double-flow-through-cooler" rel="nofollow">https:&#x2F;&#x2F;videocardz.com&#x2F;newz&#x2F;nvidia-rtx-pro-6000-blackwell-le...</a>
alecco3 个月前
Is llama.cpp&#x27;s CUDA implementation decent? (e.g. does it use CUTLASS properly or something more low level)
评论 #43318530 未加载
3np3 个月前
Great to see Mr Larabel@Phoronix both maintaining consistent legit reporting and still have time for one-offs like this in these times of AI slop and other OG writers either quitting or succumbing to the vortex. Hats off!
DrNosferatu3 个月前
And the latest AMD cards for reference?<p>Also, some DeepSeek models would be cool.
评论 #43320532 未加载
littlestymaar3 个月前
TL;DR; performance isn&#x27;t bad, but perf per Watt isn&#x27;t better than 4080 or 4090 and can even be significantly lower than 4090 in certain contexts.
评论 #43318650 未加载
评论 #43321308 未加载