TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

MK1 Flywheel Unlocks the Full Potential of AMD Instinct for LLM Inference

123 点作者 ejz超过 1 年前

14 条评论

kristianp超过 1 年前
According to [1,2], MI210 Memory Bandwidth is 1,638 GB&#x2F;s, vs RTX A6000 of 768.0 GB&#x2F;s, that HBM2e at 4096 bits really beats GDDR6 at 384bits on bandwidth anyway. So I would expect the MI210 to have better utilization for bandwidth-heavy workloads.<p>[1] <a href="https:&#x2F;&#x2F;www.techpowerup.com&#x2F;gpu-specs&#x2F;radeon-instinct-mi210.c3857" rel="nofollow">https:&#x2F;&#x2F;www.techpowerup.com&#x2F;gpu-specs&#x2F;radeon-instinct-mi210....</a> [2] <a href="https:&#x2F;&#x2F;www.techpowerup.com&#x2F;gpu-specs&#x2F;rtx-a6000.c3686" rel="nofollow">https:&#x2F;&#x2F;www.techpowerup.com&#x2F;gpu-specs&#x2F;rtx-a6000.c3686</a>
评论 #38908356 未加载
nabakin超过 1 年前
Considering how well their last post turned out[0] (read the comments), forgive me for not having much confidence in what this company claims. I&#x27;ll need independent testing.<p>[0] <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37016413">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37016413</a>
评论 #38908724 未加载
bloopernova超过 1 年前
It will be fascinating to see if AMD&#x27;s open source ML platform beats Nvidia&#x27;s closed one. (Usually, the more open platform becomes more popular.)
评论 #38908734 未加载
评论 #38909066 未加载
评论 #38907153 未加载
hereme888超过 1 年前
Isn&#x27;t this an unfair comparison, given they test the A6000 Ampere (half the price of MI210) instead of the Ada generation (still a cheaper than the MI210, but ~2x the CUDA cores as Ampere)?
评论 #38908757 未加载
z3phyr超过 1 年前
A little off-topic, but I am rather disappointed by software taking cool stuff names. As soon as I looked at the title, my first thought went to an actual flywheel (A mechanical power storage system). A flywheel in space powering some unnatural energy beam is a Vader-esque evil-cool-evil system I frequently imagine.
jadbox超过 1 年前
Is it open source? What is MK1?
matiasdwek超过 1 年前
Can I try this on AWS? Do they have any instances with AMD GPUs?
评论 #38908218 未加载
评论 #38908009 未加载
pholos超过 1 年前
Looks like good performance on AMD and NVIDIA, nice job!
Zetobal超过 1 年前
Doubt. Especially after the last announcement. Probably again optimised for the benchmark the green and low comment fanboys are also a bit sus.
nagemsley超过 1 年前
impressive
kkielhofner超过 1 年前
I support any progress to erode the Nvidia monopoly.<p>That said from what I&#x27;m seeing here the free and open source (less other aspects of the CUDA stack, of course) TensorRT-LLM[0] almost certainly bests this implementation using the Nvidia hardware they reference for comparison. Compare to real (datacenter) Nvidia GPUs that aren&#x27;t three years old and prepare to get your hair blown back.<p>I don&#x27;t have an A6000 but as an example with the tensorrt_llm backend for Nvidia Triton Inference Server (also free and open source) I get roughly 30 req&#x2F;s with Mistral 7B on my RTX 4090 with significantly lower latency and I&#x27;m in the early stages of tuning. Comparison benchmarks are tough, especially when published benchmarks like these are fairly scant on the real details.<p>TensorRT-LLM has only been public for a few months and if you peruse the docs, PRs, etc you&#x27;ll see they have many more optimizations in the works.<p>In typical Nvidia fashion TensorRT-LLM runs on <i>any</i> Nvidia GPU (from laptop to datacenter) going back to Turing (five year old cards) assuming you have the VRAM. It even works on their Jetson line of hardware.<p>You can download and run this today, free and &quot;open source&quot; for these implementations at least. I&#x27;m extremely skeptical of the claim &quot;MK1 Flywheel has the Best Throughput and Latency for LLM Inference on NVIDIA&quot;. You&#x27;ll note they compare to vLLM, which is an excellent and incredible project but if you look at vLLM vs Triton w&#x2F; TensorRT-LLM the performance improvements are dramatic.<p>Of course it&#x27;s the latest and greatest ($$$$$$ and unobtanium) but one look at H100&#x2F;H200 performance[3] and you can see what happens when the vendor has a robust software ecosystem to help sell their hardware. Pay the Nvidia tax on the frontend for the hardware, get it back and then some as a dividend via the software especially when anything close (assuming this even is) is another paid product&#x2F;SaaS&#x2F;whatever their monetization strategy is.<p>At the risk of this turning into an Nvidia sales pitch Triton will do the same thing for absolutely any model via the ONNX, TensorRT, Pytorch, Tensorflow, OpenVINO, etc backends.<p>I have an implementation generating embeddings via bge-large-v1.5 that&#x27;s also the fastest thing out there. Same for Whisper, vision models, whatever you want.<p>I feel like MK1 must be aware of TensorRT-LLM&#x2F;Triton but of course those comparison benchmarks won&#x27;t help sell their startup.<p>[0] - <a href="https:&#x2F;&#x2F;github.com&#x2F;NVIDIA&#x2F;TensorRT-LLM">https:&#x2F;&#x2F;github.com&#x2F;NVIDIA&#x2F;TensorRT-LLM</a><p>[1] - <a href="https:&#x2F;&#x2F;github.com&#x2F;triton-inference-server&#x2F;tensorrtllm_backend">https:&#x2F;&#x2F;github.com&#x2F;triton-inference-server&#x2F;tensorrtllm_backe...</a><p>[2] - <a href="https:&#x2F;&#x2F;mkone.ai&#x2F;blog&#x2F;mk1-flywheel-race-tuned-and-track-ready" rel="nofollow">https:&#x2F;&#x2F;mkone.ai&#x2F;blog&#x2F;mk1-flywheel-race-tuned-and-track-read...</a><p>[3] - <a href="https:&#x2F;&#x2F;github.com&#x2F;NVIDIA&#x2F;TensorRT-LLM&#x2F;blob&#x2F;main&#x2F;docs&#x2F;source&#x2F;blogs&#x2F;Falcon180B-H200.md#llama-70b-on-h200-up-to-67x-a100">https:&#x2F;&#x2F;github.com&#x2F;NVIDIA&#x2F;TensorRT-LLM&#x2F;blob&#x2F;main&#x2F;docs&#x2F;source...</a>
评论 #38921076 未加载
tinoargentino超过 1 年前
Nice!!!
wintercharm超过 1 年前
This is sick!
tucnak超过 1 年前
Please stop manipulating HN frontpage for publicity.
评论 #38914539 未加载