TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Qwen2.5-Max: Exploring the intelligence of large-scale MoE model

118 pointsby rochoa4 months ago

13 comments

Jackson__4 months ago
&gt;Many critical details regarding this scaling process were only disclosed with the recent release of DeepSeek V3<p>And so they decide to not disclose their own training information just after they told everyone how useful it was to get Deepseeks? Honestly can&#x27;t say I care about &quot;nearly as good as o1&quot; when its a closed API with no additional info.
评论 #42858125 未加载
kragen4 months ago
I thought there were three DeepSeek items on the HN front page, but this turned out to be a fourth one, because it&#x27;s the Qwen team saying they have a secret version of Qwen that&#x27;s actually better than DeepSeek-V3.<p>I don&#x27;t remember the last time 20% of the HN front page was about the same thing. Then again, <i>nobody</i> remembers the last time a company&#x27;s market cap fell by 569 billion dollars like NVIDIA did yesterday.
评论 #42865610 未加载
评论 #42857483 未加载
BhavdeepSethi4 months ago
HuggingFace demo: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;Qwen&#x2F;Qwen2.5-Max-Demo" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;Qwen&#x2F;Qwen2.5-Max-Demo</a><p>Source: <a href="https:&#x2F;&#x2F;x.com&#x2F;Alibaba_Qwen&#x2F;status&#x2F;1884263157574820053" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;Alibaba_Qwen&#x2F;status&#x2F;1884263157574820053</a>
ecshafer4 months ago
A Chinese company announcing this on Spring Festival eve, that is very surprising. The deep seek announcement must have put a fire under them. I am surprised anything is being done right now in these Chinese tech companies.
评论 #42855334 未加载
评论 #42855223 未加载
simonw4 months ago
This appears to be Qwen&#x27;s new best model, API only for the moment, which they say is better than DeepSeek v3.
评论 #42855252 未加载
zone4114 months ago
I just ran my NYT Connections benchmark on it: 18.6, up from 14.8 for Qwen 2.5 72B. I&#x27;ll run my other benchmarks later.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;lechmazur&#x2F;nyt-connections&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;lechmazur&#x2F;nyt-connections&#x2F;</a>
Havoc4 months ago
Kinda ambivalent about MoE in cloud. Where it could really shine though is in desktop class gear. Memory is starting to get fast enough where we might see MoEs being not painfully slow soon for large-ish models.
alecco4 months ago
No weights, no proof.
评论 #42855943 未加载
mohsen14 months ago
This is not the reasoning model. If they beat Deepseek V3 in benchmarks I think a &#x27;reasoning&#x27; model would beat o1 Pro
GaggiX4 months ago
Now they need to finetune it like R1 and o1 and it will be competitive with SOTA models.
jondwillis4 months ago
The significance of _all_ of these releases at once is not lost on me. But the reason for it is lost on me. Is there some convention? Is this political? Business strategy?
评论 #42856175 未加载
评论 #42866534 未加载
评论 #42860268 未加载
bigcat123456784 months ago
Party goes on
a_wild_dandan4 months ago
&gt; We evaluate Qwen2.5-Max alongside leading models<p>&gt; [...] we are unable to access the proprietary models such as GPT-4o and Claude-3.5-Sonnet. Therefore, we evaluate Qwen2.5-Max against DeepSeek V3<p>&quot;We&#x27;ll compare our proprietary model to other proprietary models. Except when we don&#x27;t. Then we&#x27;ll compare to non-proprietary models.&quot;