科技回声

13 条评论

>Many critical details regarding this scaling process were only disclosed with the recent release of DeepSeek V3And so they decide to not disclose their own training information just after they told everyone how useful it was to get Deepseeks? Honestly can't say I care about "nearly as good as o1" when its a closed API with no additional info.

评论 #42858125 未加载

kragen4 个月前

I thought there were three DeepSeek items on the HN front page, but this turned out to be a fourth one, because it's the Qwen team saying they have a secret version of Qwen that's actually better than DeepSeek-V3.I don't remember the last time 20% of the HN front page was about the same thing. Then again, nobody remembers the last time a company's market cap fell by 569 billion dollars like NVIDIA did yesterday.

评论 #42865610 未加载

评论 #42857483 未加载

BhavdeepSethi4 个月前

HuggingFace demo: <a href="https://huggingface.co/spaces/Qwen/Qwen2.5-Max-Demo" rel="nofollow">https://huggingface.co/spaces/Qwen/Qwen2.5-Max-Demo</a>Source: <a href="https://x.com/Alibaba_Qwen/status/1884263157574820053" rel="nofollow">https://x.com/Alibaba_Qwen/status/1884263157574820053</a>

ecshafer4 个月前

A Chinese company announcing this on Spring Festival eve, that is very surprising. The deep seek announcement must have put a fire under them. I am surprised anything is being done right now in these Chinese tech companies.

评论 #42855334 未加载

评论 #42855223 未加载

simonw4 个月前

This appears to be Qwen's new best model, API only for the moment, which they say is better than DeepSeek v3.

评论 #42855252 未加载

zone4114 个月前

I just ran my NYT Connections benchmark on it: 18.6, up from 14.8 for Qwen 2.5 72B. I'll run my other benchmarks later.<a href="https://github.com/lechmazur/nyt-connections/">https://github.com/lechmazur/nyt-connections/</a>

Havoc4 个月前

Kinda ambivalent about MoE in cloud. Where it could really shine though is in desktop class gear. Memory is starting to get fast enough where we might see MoEs being not painfully slow soon for large-ish models.

alecco4 个月前

No weights, no proof.

评论 #42855943 未加载

mohsen14 个月前

This is not the reasoning model. If they beat Deepseek V3 in benchmarks I think a 'reasoning' model would beat o1 Pro

GaggiX4 个月前

Now they need to finetune it like R1 and o1 and it will be competitive with SOTA models.

jondwillis4 个月前

The significance of _all_ of these releases at once is not lost on me. But the reason for it is lost on me. Is there some convention? Is this political? Business strategy?

评论 #42856175 未加载

评论 #42866534 未加载

评论 #42860268 未加载

bigcat123456784 个月前

Party goes on

a_wild_dandan4 个月前

> We evaluate Qwen2.5-Max alongside leading models> [...] we are unable to access the proprietary models such as GPT-4o and Claude-3.5-Sonnet. Therefore, we evaluate Qwen2.5-Max against DeepSeek V3"We'll compare our proprietary model to other proprietary models. Except when we don't. Then we'll compare to non-proprietary models."

13 条评论

Jackson__4 个月前

评论 #42858125 未加载

kragen4 个月前

评论 #42865610 未加载

评论 #42857483 未加载

BhavdeepSethi4 个月前

ecshafer4 个月前

评论 #42855334 未加载

评论 #42855223 未加载

simonw4 个月前

This appears to be Qwen's new best model, API only for the moment, which they say is better than DeepSeek v3.

评论 #42855252 未加载

zone4114 个月前

Havoc4 个月前

alecco4 个月前

No weights, no proof.

评论 #42855943 未加载

mohsen14 个月前

This is not the reasoning model. If they beat Deepseek V3 in benchmarks I think a 'reasoning' model would beat o1 Pro

GaggiX4 个月前

Now they need to finetune it like R1 and o1 and it will be competitive with SOTA models.

jondwillis4 个月前

The significance of _all_ of these releases at once is not lost on me. But the reason for it is lost on me. Is there some convention? Is this political? Business strategy?

评论 #42856175 未加载

评论 #42866534 未加载

评论 #42860268 未加载

bigcat123456784 个月前

Party goes on

a_wild_dandan4 个月前

Qwen2.5-Max: Exploring the intelligence of large-scale MoE model

13 条评论

Qwen2.5-Max: Exploring the intelligence of large-scale MoE model

13 条评论