科技回声

14 条评论

kristianp超过 1 年前

According to [1,2], MI210 Memory Bandwidth is 1,638 GB/s, vs RTX A6000 of 768.0 GB/s, that HBM2e at 4096 bits really beats GDDR6 at 384bits on bandwidth anyway. So I would expect the MI210 to have better utilization for bandwidth-heavy workloads.[1] <a href="https://www.techpowerup.com/gpu-specs/radeon-instinct-mi210.c3857" rel="nofollow">https://www.techpowerup.com/gpu-specs/radeon-instinct-mi210....</a> [2] <a href="https://www.techpowerup.com/gpu-specs/rtx-a6000.c3686" rel="nofollow">https://www.techpowerup.com/gpu-specs/rtx-a6000.c3686</a>

评论 #38908356 未加载

nabakin超过 1 年前

Considering how well their last post turned out[0] (read the comments), forgive me for not having much confidence in what this company claims. I'll need independent testing.[0] <a href="https://news.ycombinator.com/item?id=37016413">https://news.ycombinator.com/item?id=37016413</a>

评论 #38908724 未加载

bloopernova超过 1 年前

It will be fascinating to see if AMD's open source ML platform beats Nvidia's closed one. (Usually, the more open platform becomes more popular.)

评论 #38908734 未加载

评论 #38909066 未加载

评论 #38907153 未加载

hereme888超过 1 年前

Isn't this an unfair comparison, given they test the A6000 Ampere (half the price of MI210) instead of the Ada generation (still a cheaper than the MI210, but ~2x the CUDA cores as Ampere)?

评论 #38908757 未加载

z3phyr超过 1 年前

A little off-topic, but I am rather disappointed by software taking cool stuff names. As soon as I looked at the title, my first thought went to an actual flywheel (A mechanical power storage system). A flywheel in space powering some unnatural energy beam is a Vader-esque evil-cool-evil system I frequently imagine.

jadbox超过 1 年前

Is it open source? What is MK1?

matiasdwek超过 1 年前

Can I try this on AWS? Do they have any instances with AMD GPUs?

评论 #38908218 未加载

评论 #38908009 未加载

pholos超过 1 年前

Looks like good performance on AMD and NVIDIA, nice job!

Zetobal超过 1 年前

Doubt. Especially after the last announcement. Probably again optimised for the benchmark the green and low comment fanboys are also a bit sus.

nagemsley超过 1 年前

impressive

kkielhofner超过 1 年前

I support any progress to erode the Nvidia monopoly.That said from what I'm seeing here the free and open source (less other aspects of the CUDA stack, of course) TensorRT-LLM[0] almost certainly bests this implementation using the Nvidia hardware they reference for comparison. Compare to real (datacenter) Nvidia GPUs that aren't three years old and prepare to get your hair blown back.I don't have an A6000 but as an example with the tensorrt_llm backend for Nvidia Triton Inference Server (also free and open source) I get roughly 30 req/s with Mistral 7B on my RTX 4090 with significantly lower latency and I'm in the early stages of tuning. Comparison benchmarks are tough, especially when published benchmarks like these are fairly scant on the real details.TensorRT-LLM has only been public for a few months and if you peruse the docs, PRs, etc you'll see they have many more optimizations in the works.In typical Nvidia fashion TensorRT-LLM runs on any Nvidia GPU (from laptop to datacenter) going back to Turing (five year old cards) assuming you have the VRAM. It even works on their Jetson line of hardware.You can download and run this today, free and "open source" for these implementations at least. I'm extremely skeptical of the claim "MK1 Flywheel has the Best Throughput and Latency for LLM Inference on NVIDIA". You'll note they compare to vLLM, which is an excellent and incredible project but if you look at vLLM vs Triton w/ TensorRT-LLM the performance improvements are dramatic.Of course it's the latest and greatest ($$$$$$ and unobtanium) but one look at H100/H200 performance[3] and you can see what happens when the vendor has a robust software ecosystem to help sell their hardware. Pay the Nvidia tax on the frontend for the hardware, get it back and then some as a dividend via the software especially when anything close (assuming this even is) is another paid product/SaaS/whatever their monetization strategy is.At the risk of this turning into an Nvidia sales pitch Triton will do the same thing for absolutely any model via the ONNX, TensorRT, Pytorch, Tensorflow, OpenVINO, etc backends.I have an implementation generating embeddings via bge-large-v1.5 that's also the fastest thing out there. Same for Whisper, vision models, whatever you want.I feel like MK1 must be aware of TensorRT-LLM/Triton but of course those comparison benchmarks won't help sell their startup.[0] - <a href="https://github.com/NVIDIA/TensorRT-LLM">https://github.com/NVIDIA/TensorRT-LLM</a>[1] - <a href="https://github.com/triton-inference-server/tensorrtllm_backend">https://github.com/triton-inference-server/tensorrtllm_backe...</a>[2] - <a href="https://mkone.ai/blog/mk1-flywheel-race-tuned-and-track-ready" rel="nofollow">https://mkone.ai/blog/mk1-flywheel-race-tuned-and-track-read...</a>[3] - <a href="https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/Falcon180B-H200.md#llama-70b-on-h200-up-to-67x-a100">https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source...</a>

评论 #38921076 未加载

tinoargentino超过 1 年前

Nice!!!

wintercharm超过 1 年前

This is sick!

tucnak超过 1 年前

Please stop manipulating HN frontpage for publicity.

评论 #38914539 未加载

14 条评论

kristianp超过 1 年前

评论 #38908356 未加载

nabakin超过 1 年前

评论 #38908724 未加载

bloopernova超过 1 年前

It will be fascinating to see if AMD's open source ML platform beats Nvidia's closed one. (Usually, the more open platform becomes more popular.)

评论 #38908734 未加载

评论 #38909066 未加载

评论 #38907153 未加载

hereme888超过 1 年前

Isn't this an unfair comparison, given they test the A6000 Ampere (half the price of MI210) instead of the Ada generation (still a cheaper than the MI210, but ~2x the CUDA cores as Ampere)?

评论 #38908757 未加载

z3phyr超过 1 年前

jadbox超过 1 年前

Is it open source? What is MK1?

matiasdwek超过 1 年前

Can I try this on AWS? Do they have any instances with AMD GPUs?

评论 #38908218 未加载

评论 #38908009 未加载

pholos超过 1 年前

Looks like good performance on AMD and NVIDIA, nice job!

Zetobal超过 1 年前

Doubt. Especially after the last announcement. Probably again optimised for the benchmark the green and low comment fanboys are also a bit sus.

nagemsley超过 1 年前

impressive

kkielhofner超过 1 年前

评论 #38921076 未加载

tinoargentino超过 1 年前

Nice!!!

wintercharm超过 1 年前

This is sick!

tucnak超过 1 年前

Please stop manipulating HN frontpage for publicity.

评论 #38914539 未加载

MK1 Flywheel Unlocks the Full Potential of AMD Instinct for LLM Inference

14 条评论

MK1 Flywheel Unlocks the Full Potential of AMD Instinct for LLM Inference

14 条评论