TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Qwen2.5-Max: Exploring the intelligence of large-scale MoE model

118 点作者 rochoa4 个月前

13 条评论

Jackson__4 个月前
&gt;Many critical details regarding this scaling process were only disclosed with the recent release of DeepSeek V3<p>And so they decide to not disclose their own training information just after they told everyone how useful it was to get Deepseeks? Honestly can&#x27;t say I care about &quot;nearly as good as o1&quot; when its a closed API with no additional info.
评论 #42858125 未加载
kragen4 个月前
I thought there were three DeepSeek items on the HN front page, but this turned out to be a fourth one, because it&#x27;s the Qwen team saying they have a secret version of Qwen that&#x27;s actually better than DeepSeek-V3.<p>I don&#x27;t remember the last time 20% of the HN front page was about the same thing. Then again, <i>nobody</i> remembers the last time a company&#x27;s market cap fell by 569 billion dollars like NVIDIA did yesterday.
评论 #42865610 未加载
评论 #42857483 未加载
BhavdeepSethi4 个月前
HuggingFace demo: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;Qwen&#x2F;Qwen2.5-Max-Demo" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;Qwen&#x2F;Qwen2.5-Max-Demo</a><p>Source: <a href="https:&#x2F;&#x2F;x.com&#x2F;Alibaba_Qwen&#x2F;status&#x2F;1884263157574820053" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;Alibaba_Qwen&#x2F;status&#x2F;1884263157574820053</a>
ecshafer4 个月前
A Chinese company announcing this on Spring Festival eve, that is very surprising. The deep seek announcement must have put a fire under them. I am surprised anything is being done right now in these Chinese tech companies.
评论 #42855334 未加载
评论 #42855223 未加载
simonw4 个月前
This appears to be Qwen&#x27;s new best model, API only for the moment, which they say is better than DeepSeek v3.
评论 #42855252 未加载
zone4114 个月前
I just ran my NYT Connections benchmark on it: 18.6, up from 14.8 for Qwen 2.5 72B. I&#x27;ll run my other benchmarks later.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;lechmazur&#x2F;nyt-connections&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;lechmazur&#x2F;nyt-connections&#x2F;</a>
Havoc4 个月前
Kinda ambivalent about MoE in cloud. Where it could really shine though is in desktop class gear. Memory is starting to get fast enough where we might see MoEs being not painfully slow soon for large-ish models.
alecco4 个月前
No weights, no proof.
评论 #42855943 未加载
mohsen14 个月前
This is not the reasoning model. If they beat Deepseek V3 in benchmarks I think a &#x27;reasoning&#x27; model would beat o1 Pro
GaggiX4 个月前
Now they need to finetune it like R1 and o1 and it will be competitive with SOTA models.
jondwillis4 个月前
The significance of _all_ of these releases at once is not lost on me. But the reason for it is lost on me. Is there some convention? Is this political? Business strategy?
评论 #42856175 未加载
评论 #42866534 未加载
评论 #42860268 未加载
bigcat123456784 个月前
Party goes on
a_wild_dandan4 个月前
&gt; We evaluate Qwen2.5-Max alongside leading models<p>&gt; [...] we are unable to access the proprietary models such as GPT-4o and Claude-3.5-Sonnet. Therefore, we evaluate Qwen2.5-Max against DeepSeek V3<p>&quot;We&#x27;ll compare our proprietary model to other proprietary models. Except when we don&#x27;t. Then we&#x27;ll compare to non-proprietary models.&quot;