21 点作者 moondistance7 个月前

3 条评论

not "an H200", "In the table above, tensor parallelism is compared to pipeline parallelism with each across eight GPUs"

评论 #41833897 未加载

7e7 个月前

And this is why nobody submits MLPerf against NVIDIA.

评论 #41833932 未加载

Significant further optimizations. FP8!

Llama 405B 506 tokens/second on an H200