TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

GPU utilization can be a misleading metric

144 点作者 roanakb9 个月前

11 条评论

SnowflakeOnIce9 个月前
&gt; you can get 100% GPU utilization by just reading&#x2F;writing to memory while doing 0 computations<p>Indeed! Utilization is a proxy for what you actually want (which is good use of available hardware). 100% GPU utilization doesn&#x27;t actually indicate this.<p>On the other hand, if you <i>aren&#x27;t</i> getting 100% GPU utilization, you aren&#x27;t making good use of the hardware.
评论 #41326398 未加载
评论 #41324724 未加载
评论 #41322746 未加载
评论 #41326197 未加载
评论 #41327823 未加载
antognini9 个月前
When understanding the performance of your model it&#x27;s very helpful to look at a roofline plot [1]. The roofline plot will show you the floating-point performance as a function of arithmetic intensity for the various ops in your model. The plot has two regimes: a memory-bound regime on the left and a compute-bound regime on the right. This can help to identify memory-bound ops that are taking a significant fraction of compute time.<p>[1]: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Roofline_model" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Roofline_model</a>
评论 #41323084 未加载
sundalia9 个月前
Application-specific metrics are the way to go. For ML training this is one example: <a href="https:&#x2F;&#x2F;cloud.google.com&#x2F;blog&#x2F;products&#x2F;ai-machine-learning&#x2F;goodput-metric-as-measure-of-ml-productivity" rel="nofollow">https:&#x2F;&#x2F;cloud.google.com&#x2F;blog&#x2F;products&#x2F;ai-machine-learning&#x2F;g...</a>
评论 #41323130 未加载
sergiotapia9 个月前
running GPU models and maximizing utilization is pretty opaque to me as a layman coming into the scene.<p>take this example: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;sergiotapia&#x2F;efc9b3f7163ba803a260b481470255c1" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;sergiotapia&#x2F;efc9b3f7163ba803a260b481...</a> - running a fairly simple model that takes only 70ms per image pair, but because I have 300 images it becomes a big time sink.<p>by using ThreadPoolExecutor, I cut that down to about 16 seconds. i wonder if there is a fairly obvious way to truly utlize my beefy L40S GPU! is it MPS? I haven&#x27;t been successful at even running the MPS daemon on my linux server yet. very opaque for sure!
评论 #41323628 未加载
评论 #41322565 未加载
评论 #41322617 未加载
DamonsJ9 个月前
&quot;If we have a CUDA kernel that continuously runs for 10 seconds but only uses 1 SM, on an H100, this would register 100% utilization, but the SM efficiency would be 1 &#x2F; 132 = 0.7%.&quot;<p>does this situation register 100% utilization? BTW, the SM OCCUPANCY is also a metric you need to care about if you concern on kernel efficiency
评论 #41326107 未加载
评论 #41331602 未加载
saagarjha9 个月前
If you have a basic understanding of what your kernels are supposed to do, looking at pipe usage and roofline analysis in Nsight Compute is often helpful, since it will show you how hard you’re saturating those.
pavelstoev9 个月前
I recommend hidet backend in torch.compile - implements many advanced model-specific optimizations automatically. <a href="https:&#x2F;&#x2F;github.com&#x2F;hidet-org&#x2F;hidet">https:&#x2F;&#x2F;github.com&#x2F;hidet-org&#x2F;hidet</a>
评论 #41326121 未加载
areichenbach9 个月前
I’ve recently been trusting gpu watt usage over utilization. Any idea how good that is as a simple proxy (if I’m just looking at nvidia-smi)?
评论 #41330583 未加载
评论 #41331029 未加载
danielvaughn9 个月前
We ran into a similar problem with CPU utilization at my job. Created an alert for when our systems hit 90% CPU util, and ended up with a ton of noise. We realized that for some of our workloads, this was normal and expected.
ScoutOrgo9 个月前
As someone that is familiar with using nvidia-smi to track util, what are some commands people use to track the SM efficiency? The end of the article had some references, but no examples of what to use explicitly.
评论 #41331701 未加载
AeZ1E9 个月前
gpu utilization is not everything, people! mfus are where it&#x27;s at. time to recalibrate those expectations and tap into the true potential of your gpus. brace yourselves, the real efficiency is yet to come!