TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Qwen2-Math

128 pointsby limoce10 months ago

11 comments

vessenes10 months ago
Sample solution for Balkan MO 2023 seems .. questionable?<p>The problem involves players removing stones sequentially and asking which will win with perfect play: the listed answer definitely doesn’t list all possible types of strategies.<p>The answer it gives may be right; in fact I bet it is correct (the second player), but does the qwen team offer the solution as correct including the logic? And is the solution logic correct?
评论 #41194147 未加载
评论 #41193321 未加载
评论 #41193640 未加载
评论 #41195717 未加载
评论 #41201497 未加载
rahimnathwani10 months ago
FYI the model is almost 150GB: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;Qwen&#x2F;Qwen2-Math-72B-Instruct&#x2F;tree&#x2F;main" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;Qwen&#x2F;Qwen2-Math-72B-Instruct&#x2F;tree&#x2F;mai...</a>
评论 #41193976 未加载
tempfile10 months ago
First solution (IMO 2002) is completely wrong. It shows that 1,2,3 cubes are not sufficient, and provide an obstacle that <i>doesn&#x27;t rule out</i> 4 cubes, but does not prove that there actually are 4 cubes that sum to the given number. This is much harder (and I don&#x27;t know the true answer)
评论 #41196369 未加载
ipnon10 months ago
These solutions aren&#x27;t perfect, but imagine how many more people can become mathematicians now that the price of an elite IMO medal winning tutor can be quantified as Hugging Face hosting costs!
评论 #41194460 未加载
评论 #41195760 未加载
ziofill10 months ago
I see that they do some decontamination of the datasets, in the hope that the models won&#x27;t just recite answers from the training data. But in the recent interview with Subbarao Kambhampati on MLST (<a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=y1WnHpedi2A" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=y1WnHpedi2A</a>) they explain that models fail as soon as one slightly rephrases the test problems (indicating that they are indeed mostly reciting). I expect this to be the case with this model too.
评论 #41194233 未加载
评论 #41193756 未加载
beyondCritics10 months ago
It is obious that all of these problems are still way too hard, although sometimes it has ideas. It flawlessly demonstrates how to simplify (2002^2002) mod 9. I recall that there was once a scandalous university exam for future math teachers in germany, which asked to do tasks like that, but all failed the test. With Qwen-2 at hand this might not have happened.
next_xibalba10 months ago
Kind of surprised this was released in English first given it was produced by a Chinese group (Alibaba Cloud). I wonder why that is.
评论 #41194388 未加载
评论 #41194350 未加载
qrian10 months ago
The solution for IMO 2022 is barely a 1&#x2F;7 solution. It just says ‘ might not satisfy the inequality for all y’ without a proof. That was the point of the question.
azinman210 months ago
&gt; This model mainly supports English. We will release bilingual (English and Chinese) math models soon<p>The irony.
allanren10 months ago
Qwen2 has been quite good, but still can&#x27;t compare 9.9 and 9.11
Leary10 months ago
Where could I try this?
评论 #41198751 未加载
评论 #41193108 未加载