TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Recomputing ML GPU Performance: AMD vs. Nvidia

15 点作者 theolivenbaum超过 1 年前

2 条评论

ternaus超过 1 年前
All the reasons:<p>[1] The compilers don’t produce great instructions;<p>[2] The drivers crash frequently: ML workloads feel experimental;<p>[3] Software adoption is getting there, but kernels are less optimized within frameworks, in particular because of the fracture between ROCm and CUDA. When you are a developer and you need to write code twice, one version won’t be as good, and it is the one with less adoption;<p>[4] StackOverflow mindshare is lesser. Debugging problems is thus harder, as fewer people have encountered them.<p>---<p>were crucial when we had enough supply of NVidia GPUs, but if demand described in <a href="https:&#x2F;&#x2F;gpus.llm-utils.org&#x2F;nvidia-h100-gpus-supply-and-demand&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;gpus.llm-utils.org&#x2F;nvidia-h100-gpus-supply-and-deman...</a><p>is real (450,000+ H100)<p>Software bottlenecks most likely will be addressed sometime soon
brucethemoose2超过 1 年前
&gt; (For context, Hotz raised $5M to improve RX 7900 XTX support and sell a $15K prebuilt consumer computer that runs 65B-parameter LLMs. A plethora of driver crashes later, he almost gave up on AMD.)<p>Again, I wish Hotz and TinyGrad the best, especially for training&#x2F;experimentation on AMD, but I feel like Apache TVM and the Various MLIR efforts (like Pytorch MLIR, SHARK, Mojo) are much more promising for ML inference. Even triton in PyTorch is very promising, with an endorsement from AMD.
评论 #37385857 未加载