科技回声

11 条评论

vessenes10 个月前

Sample solution for Balkan MO 2023 seems .. questionable?The problem involves players removing stones sequentially and asking which will win with perfect play: the listed answer definitely doesn’t list all possible types of strategies.The answer it gives may be right; in fact I bet it is correct (the second player), but does the qwen team offer the solution as correct including the logic? And is the solution logic correct?

评论 #41194147 未加载

评论 #41193321 未加载

评论 #41193640 未加载

评论 #41195717 未加载

评论 #41201497 未加载

rahimnathwani10 个月前

FYI the model is almost 150GB: <a href="https://huggingface.co/Qwen/Qwen2-Math-72B-Instruct/tree/main" rel="nofollow">https://huggingface.co/Qwen/Qwen2-Math-72B-Instruct/tree/mai...</a>

评论 #41193976 未加载

tempfile10 个月前

First solution (IMO 2002) is completely wrong. It shows that 1,2,3 cubes are not sufficient, and provide an obstacle that doesn't rule out 4 cubes, but does not prove that there actually are 4 cubes that sum to the given number. This is much harder (and I don't know the true answer)

评论 #41196369 未加载

ipnon10 个月前

These solutions aren't perfect, but imagine how many more people can become mathematicians now that the price of an elite IMO medal winning tutor can be quantified as Hugging Face hosting costs!

评论 #41194460 未加载

评论 #41195760 未加载

ziofill10 个月前

I see that they do some decontamination of the datasets, in the hope that the models won't just recite answers from the training data. But in the recent interview with Subbarao Kambhampati on MLST (<a href="https://www.youtube.com/watch?v=y1WnHpedi2A" rel="nofollow">https://www.youtube.com/watch?v=y1WnHpedi2A</a>) they explain that models fail as soon as one slightly rephrases the test problems (indicating that they are indeed mostly reciting). I expect this to be the case with this model too.

评论 #41194233 未加载

评论 #41193756 未加载

beyondCritics10 个月前

It is obious that all of these problems are still way too hard, although sometimes it has ideas. It flawlessly demonstrates how to simplify (2002^2002) mod 9. I recall that there was once a scandalous university exam for future math teachers in germany, which asked to do tasks like that, but all failed the test. With Qwen-2 at hand this might not have happened.

next_xibalba10 个月前

Kind of surprised this was released in English first given it was produced by a Chinese group (Alibaba Cloud). I wonder why that is.

评论 #41194388 未加载

评论 #41194350 未加载

qrian10 个月前

The solution for IMO 2022 is barely a 1/7 solution. It just says ‘ might not satisfy the inequality for all y’ without a proof. That was the point of the question.

azinman210 个月前

> This model mainly supports English. We will release bilingual (English and Chinese) math models soonThe irony.

allanren10 个月前

Qwen2 has been quite good, but still can't compare 9.9 and 9.11

Leary10 个月前

Where could I try this?

评论 #41198751 未加载

评论 #41193108 未加载