7 点作者 axiom92超过 3 年前

1 comment

axiom92超过 3 年前

The model, named Megatron-Turing NLG 530B, is about 3x bigger than GPT-3.<p>The blog post doesn't provide a lot of numbers but looks like it beats the state-of-the-art in a couple of commonsense reasoning benchmarks.<p>Still, it shows that you can't just keep scaling the models and expect magic.

评论 #28841317 未加载

Microsoft and Nvidia have created a 530B parameter language model

1 comment

Microsoft and Nvidia have created a 530B parameter language model

1 comment