14 点作者 jasondavies大约 1 年前

2 条评论

unraveller大约 1 年前

It's claiming to be llama3-70B tier in strength, 3x cheaper, 3-5x faster than it due to only having 21B out of 400B+ activated at any one time. With L3-70B normally costing <$1/Million.

bearjaws大约 1 年前

It's performance at 21B parameters is very impressive.<p>I also like using something between 13 and 70B parameters, since it will run on a 32GB MacBook Pro easily.

评论 #40282145 未加载

DeepSeek-V2: A Strong, Economical, and Efficient Moe Language Model

2 条评论

DeepSeek-V2: A Strong, Economical, and Efficient Moe Language Model

2 条评论