TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Qwen2-Math

128 点作者 limoce10 个月前

11 条评论

vessenes10 个月前
Sample solution for Balkan MO 2023 seems .. questionable?<p>The problem involves players removing stones sequentially and asking which will win with perfect play: the listed answer definitely doesn’t list all possible types of strategies.<p>The answer it gives may be right; in fact I bet it is correct (the second player), but does the qwen team offer the solution as correct including the logic? And is the solution logic correct?
评论 #41194147 未加载
评论 #41193321 未加载
评论 #41193640 未加载
评论 #41195717 未加载
评论 #41201497 未加载
rahimnathwani10 个月前
FYI the model is almost 150GB: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;Qwen&#x2F;Qwen2-Math-72B-Instruct&#x2F;tree&#x2F;main" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;Qwen&#x2F;Qwen2-Math-72B-Instruct&#x2F;tree&#x2F;mai...</a>
评论 #41193976 未加载
tempfile10 个月前
First solution (IMO 2002) is completely wrong. It shows that 1,2,3 cubes are not sufficient, and provide an obstacle that <i>doesn&#x27;t rule out</i> 4 cubes, but does not prove that there actually are 4 cubes that sum to the given number. This is much harder (and I don&#x27;t know the true answer)
评论 #41196369 未加载
ipnon10 个月前
These solutions aren&#x27;t perfect, but imagine how many more people can become mathematicians now that the price of an elite IMO medal winning tutor can be quantified as Hugging Face hosting costs!
评论 #41194460 未加载
评论 #41195760 未加载
ziofill10 个月前
I see that they do some decontamination of the datasets, in the hope that the models won&#x27;t just recite answers from the training data. But in the recent interview with Subbarao Kambhampati on MLST (<a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=y1WnHpedi2A" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=y1WnHpedi2A</a>) they explain that models fail as soon as one slightly rephrases the test problems (indicating that they are indeed mostly reciting). I expect this to be the case with this model too.
评论 #41194233 未加载
评论 #41193756 未加载
beyondCritics10 个月前
It is obious that all of these problems are still way too hard, although sometimes it has ideas. It flawlessly demonstrates how to simplify (2002^2002) mod 9. I recall that there was once a scandalous university exam for future math teachers in germany, which asked to do tasks like that, but all failed the test. With Qwen-2 at hand this might not have happened.
next_xibalba10 个月前
Kind of surprised this was released in English first given it was produced by a Chinese group (Alibaba Cloud). I wonder why that is.
评论 #41194388 未加载
评论 #41194350 未加载
qrian10 个月前
The solution for IMO 2022 is barely a 1&#x2F;7 solution. It just says ‘ might not satisfy the inequality for all y’ without a proof. That was the point of the question.
azinman210 个月前
&gt; This model mainly supports English. We will release bilingual (English and Chinese) math models soon<p>The irony.
allanren10 个月前
Qwen2 has been quite good, but still can&#x27;t compare 9.9 and 9.11
Leary10 个月前
Where could I try this?
评论 #41198751 未加载
评论 #41193108 未加载