TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

AI solves International Math Olympiad problems at silver medal level

1370 pointsby ocfnash10 months ago

84 comments

Smaug12310 months ago
So I am extremely hyped about this, but it&#x27;s not clear to me how much heavy lifting this sentence is doing:<p>&gt; First, the problems were manually translated into formal mathematical language for our systems to understand.<p>The non-geometry problems which were solved were all of the form &quot;Determine all X such that…&quot;, and the resulting theorem statements are all of the form &quot;We show that the set of all X is {foo}&quot;. The downloadable solutions from <a href="https:&#x2F;&#x2F;storage.googleapis.com&#x2F;deepmind-media&#x2F;DeepMind.com&#x2F;Blog&#x2F;imo-2024-solutions&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;storage.googleapis.com&#x2F;deepmind-media&#x2F;DeepMind.com&#x2F;B...</a> don&#x27;t make it clear whether the set {foo} was decided by a human during this translation step, or whether the computer found it. I <i>want</i> to believe that the computer found it, but I can&#x27;t find anything to confirm. Anyone know?
评论 #41070372 未加载
评论 #41070854 未加载
评论 #41070467 未加载
评论 #41070696 未加载
评论 #41070500 未加载
评论 #41072417 未加载
评论 #41072180 未加载
评论 #41070491 未加载
评论 #41076565 未加载
评论 #41071161 未加载
评论 #41077657 未加载
评论 #41075374 未加载
评论 #41076993 未加载
necovek10 months ago
This is certainly impressive, but whenever IMO is brought up, a caveat should be put out: medals are awarded to 50% of the participants (high school students), with 1:2:3 ratio between gold, silver and bronze. That puts all gold and silver medalists among the top 25% of the participants.<p>That means that &quot;AI solves IMO problems better than 75% of the students&quot;, which is probably even more impressive.<p>But, &quot;minutes for one problem and up to 3 days for each remaining problem&quot; means that this is unfortunately not a true representation either. If these students were given up to 15 days (5 problems at &quot;up to 3 days each&quot;) instead of 9h, there would probably be more of them that match or beat this score too.<p>It really sounds like AI solved only a single problem in the 9h students get, so it certainly would not be even close to the medals. What&#x27;s the need to taint the impressive result with apples-to-oranges comparison?<p>Why not be more objective and report that it took longer but was able to solve X% of problems (or scored X out of N points)?
评论 #41077055 未加载
评论 #41075274 未加载
评论 #41073929 未加载
评论 #41074195 未加载
评论 #41073903 未加载
评论 #41074275 未加载
评论 #41078137 未加载
评论 #41078036 未加载
golol10 months ago
This is the real deal. AlphaGeometry solved a very limited set of problems with a lot of brute force search. This is a much broader method that I believe will have a great impact on the way we do mathematics. They are really implementing a self-feeding pipeling from natural language mathematics to formalized mathematics where they can train both formalization and proving. In principle this pipeline can also learn basic theory building like creating auxilliary definitions and Lemmas. I really think this is the holy grail of proof-assistance and will allow us to formalize most mathematics that we create very naturally. Humans will work podt-rigorously and let the machine asisst with filling in the details.
评论 #41072771 未加载
评论 #41070908 未加载
评论 #41077380 未加载
评论 #41073721 未加载
评论 #41072864 未加载
评论 #41075690 未加载
Ericson231410 months ago
The lede is a bit buried: they&#x27;re using Lean!<p>This is important for more than Math problems. Making ML models wrestle with proof systems is a good way to avoid bullshit in general.<p>Hopefully more humans write types in Lean and similar systems as a much way of writing prompts.
评论 #41070274 未加载
评论 #41072191 未加载
评论 #41070255 未加载
michael_nielsen10 months ago
A good brief overview here from Tim Gowers (a Fields Medallist, who participated in the effort), explaining and contextualizing some of the main caveats: <a href="https:&#x2F;&#x2F;x.com&#x2F;wtgowers&#x2F;status&#x2F;1816509803407040909" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;wtgowers&#x2F;status&#x2F;1816509803407040909</a>
signa1110 months ago
&gt; ... but whenever IMO is brought up, a caveat should be put out: medals are awarded to 50% of the participants (high school students), with 1:2:3 ratio between gold, silver and bronze. That puts all gold and silver medalists among the top 25% of the participants.<p>yes, it is true, but getting to the country specific team is itself an arduous journey, and involves brutal winnowing every step of the way f.e. regional math-olympiad, and then national math-olympiad etc.<p>this is then followed by further trainings specifically meant for this elite bunch, and maybe further eliminations etc.<p>suffice it to say, that qualifying to be in a country specific team is imho a <i>big</i> deal. getting a gold&#x2F;silver from amongst them is just plain <i>awesome</i> !
评论 #41077278 未加载
fancyfredbot10 months ago
I&#x27;m seriously jealous of the people getting paid to work on this. Sounds great fun and must be incredibly satisfying to move the state of the art forward like that.
评论 #41074110 未加载
评论 #41076954 未加载
评论 #41073337 未加载
评论 #41072693 未加载
评论 #41072928 未加载
cynicalpeace10 months ago
Machines have been better than humans at chess for decades.<p>Yet no one cares. Everyone&#x27;s busy watching Magnus Carlsen.<p>We are human. This means we care about what other humans do. We only care about machines insofar as it serves us.<p>This principle is broadly extensible to work and art. Humans will always have a place in these realms as long as humans are around.
评论 #41073492 未加载
评论 #41072359 未加载
评论 #41072628 未加载
评论 #41074850 未加载
评论 #41073452 未加载
评论 #41072779 未加载
评论 #41075797 未加载
评论 #41072606 未加载
thrance10 months ago
Theorem proving is a single-player game with an insanely big search space, I always thouht it would be solved long before AGI.<p>IMHO, the largest contributors to AlphaProof were the people behind Lean and Mathlib, who took the daunting task of formalizing the <i>entirety</i> of mathematics to themselves.<p>This lack of formalizing in math papers was what killed any attempt at automation, because AI researcher had to wrestle with the human element of figuring out the author&#x27;s own notations, implicit knowledge, skipped proof steps...
评论 #41072381 未加载
评论 #41076848 未加载
评论 #41072696 未加载
zone41110 months ago
The best discussion is here: <a href="https:&#x2F;&#x2F;leanprover.zulipchat.com&#x2F;#narrow&#x2F;stream&#x2F;219941-Machine-Learning-for-Theorem-Proving" rel="nofollow">https:&#x2F;&#x2F;leanprover.zulipchat.com&#x2F;#narrow&#x2F;stream&#x2F;219941-Machi...</a>
Jun810 months ago
Tangentially: I found it fascinating to follow along the solution to Problem 6: <a href="https:&#x2F;&#x2F;youtu.be&#x2F;7h3gJfWnDoc" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;7h3gJfWnDoc</a> (aquaesulian is a node to ancient name of Bath). There’s no advanced math and each step is quite simple, I’d guess on a medium 8th grader level.<p>Note that the 6th question is generally the hardest (“final boss”) and many top performers couldn’t solve it.<p>I don’t know what Lean is or how see AI’s proofs but an AI system that can explain such a question on par with the YouTuber above would be fantastic!
nopinsight10 months ago
Once Gemini, the LLM, integrates with AlphaProof and AlphaGeometry 2, it might be able to <i>reliably</i> perform logical reasoning. If that&#x27;s the case, software development might be revolutionized.<p>&quot;... We&#x27;ll be bringing all the goodness of AlphaProof and AlphaGeometry 2 to our mainstream #Gemini models very soon. Watch this space!&quot; -- Demis Hassabis, CEO of Google DeepMind. <a href="https:&#x2F;&#x2F;x.com&#x2F;demishassabis&#x2F;status&#x2F;1816499055880437909" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;demishassabis&#x2F;status&#x2F;1816499055880437909</a>
adverbly10 months ago
&gt; First, the problems were manually translated into formal mathematical language for our systems to understand. In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others.<p>Three days is interesting... Not technically silver medal performance I guess, but let&#x27;s be real I&#x27;d be okay waiting a month for the cure to cancer.
评论 #41070126 未加载
评论 #41070708 未加载
评论 #41070449 未加载
评论 #41070350 未加载
评论 #41070238 未加载
评论 #41070496 未加载
riku_iki10 months ago
Example of proof from AlphaProof system: <a href="https:&#x2F;&#x2F;storage.googleapis.com&#x2F;deepmind-media&#x2F;DeepMind.com&#x2F;Blog&#x2F;imo-2024-solutions&#x2F;P6&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;storage.googleapis.com&#x2F;deepmind-media&#x2F;DeepMind.com&#x2F;B...</a>
SJC_Hacker10 months ago
The kicker with some of those math competition problems, there will be problems that reduce to finding all natural numbers for which some statement is true. These are almost always small numbers, less than 100 in most circumstances.<p>Which means these problems are trivial to solve if you have a computer - you can simply check all possibilities. And is precisely the reason why calculators aren&#x27;t allowed.<p>But exhaustive searches are not feasible by hand in the time span the problems are supposed to be solved - roughly 30 minutes per problem. You are not supposed to use brute force, but recognize a key insight which simplifies the problem. And I believe even if you did do an exhaustive search, simply giving the answer is not enough for full points. You would have to give adequate justification.
robinhouston10 months ago
Some more context is provided by Tim Gowers on Twitter [1].<p>Since I think you need an account to read threads now, here&#x27;s a transcript:<p>Google DeepMind have produced a program that in a certain sense has achieved a silver-medal peformance at this year&#x27;s International Mathematical Olympiad.<p>It did this by solving four of the six problems completely, which got it 28 points out of a possible total of 42. I&#x27;m not quite sure, but I think that put it ahead of all but around 60 competitors.<p>However, that statement needs a bit of qualifying.<p>The main qualification is that the program needed a lot longer than the human competitors -- for some of the problems over 60 hours -- and of course much faster processing speed than the poor old human brain.<p>If the human competitors had been allowed that sort of time per problem they would undoubtedly have scored higher.<p>Nevertheless, (i) this is well beyond what automatic theorem provers could do before, and (ii) these times are likely to come down as efficiency gains are made.<p>Another qualification is that the problems were manually translated into the proof assistant Lean, and only then did the program get to work. But the essential mathematics was done by the program: just the autoformalization part was done by humans.<p>As with AlphaGo, the program learnt to do what it did by teaching itself. But for that it needed a big collection of problems to work on. They achieved that in an interesting way: they took a huge database of IMO-type problems and got a large language model to formalize them.<p>However, LLMs are not able to autoformalize reliably, so they got them to autoformalize each problem many times. Some of the formalizations were correct, but even the incorrect ones were useful as training data, as often they were easier problems.<p>It&#x27;s not clear what the implications of this are for mathematical research. Since the method used was very general, there would seem to be no obvious obstacle to adapting it to other mathematical domains, apart perhaps from insufficient data.<p>So we might be close to having a program that would enable mathematicians to get answers to a wide range of questions, provided those questions weren&#x27;t <i>too</i> difficult -- the kind of thing one can do in a couple of hours.<p>That would be massively useful as a research tool, even if it wasn&#x27;t itself capable of solving open problems.<p>Are we close to the point where mathematicians are redundant? It&#x27;s hard to say. I would guess that we&#x27;re still a breakthrough or two short of that.<p>It will be interesting to see how the time the program takes scales as the difficulty of the problems it solves increases. If it scales with a similar ratio to that of a human mathematician, then we might have to get worried.<p>But if the function human time taken --&gt; computer time taken grows a lot faster than linearly, then more AI work will be needed.<p>The fact that the program takes as long as it does suggests that it hasn&#x27;t &quot;solved mathematics&quot;.<p>However, what it does is way beyond what a pure brute-force search would be capable of, so there is clearly something interesting going on when it operates. We&#x27;ll all have to watch this space.<p>1. <a href="https:&#x2F;&#x2F;x.com&#x2F;wtgowers&#x2F;status&#x2F;1816509803407040909?s=46" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;wtgowers&#x2F;status&#x2F;1816509803407040909?s=46</a>
评论 #41071084 未加载
amarant10 months ago
This is quite cool! I&#x27;ve found logical reasoning to be one of the biggest weak points of LLMs, nice to see that an alternative approach works better! I&#x27;ve tried to enlist gpt to help me play a android game called 4=10, where you solve simple math problems, and gpt was hilariously terrible at it. It would both break the rules I described, and make math mistakes, such as claiming 6*5-5+8=10<p>I wonder if this new model could be integrated with an LLM somehow? I get the feeling that combining those two powers would result in a fairly capable programmer.<p>Also perhaps a LLM could do the translation step that is currently manual?
petters10 months ago
The problems were first converted into a formal language. So they were partly solved by the AI
评论 #41070397 未加载
评论 #41070171 未加载
评论 #41071117 未加载
评论 #41070344 未加载
评论 #41070276 未加载
nitrobeast10 months ago
Reading into the details, the system is more impressive than the title. 100% of the algebra and geometry problems were solved. The remaining problems are of combinatorial types, which ironically more closely resembles software engineering work.
gallerdude10 months ago
Sometimes I wonder if in 100 years, it&#x27;s going to be surprising to people that computers had a use before AI...
评论 #41074280 未加载
评论 #41073378 未加载
StefanBatory10 months ago
Wow, that&#x27;s absolutely impressive to hear!<p>Also it&#x27;s making me think that in 5-10 years almost all tasks involving computer scientists or mathematicians will be done in AI. Perhaps people going into trades had a point.
评论 #41071165 未加载
_heimdall10 months ago
I&#x27;m still unclear whether the system used here is actually reasoning through the process of solving the problem, or brute forcing solutions with reasoning coming in during the mathematical proof of each potential proof.<p>Is it clear whether the algorithm is actually learning from <i>why</i> previously attempted solutions failed to prove out, or is it statistically generating potential answers similar to an LLM and then trying to apply reasoning to prove out the potential solution?
HL33tibCe710 months ago
This is kind of an ideal use-case for AI, because we can say with absolute certainty whether their solution is correct, completely eliminating the problem of hallucination.
majikaja10 months ago
It would be nice if on the page they included detailed descriptions of the proofs it came up with, more information about the capabilities of the system and insights into the training process...<p>If the data is synthetic and covers a limited class of problems I would imagine what it&#x27;s doing mostly reduces to some basic search pattern heuristics which would be of more value to understand than just being told it can solve a few problems in three days.
评论 #41070747 未加载
seydor10 months ago
We need to up the ante: Getting human-like performance on any task is not impressive in itself, what matters is superhuman, orders of magnitude above. These comparisons with humans in order create impressive sounding titles are disguising the fact that we are still at the stone age of intelligence.
zhiQ10 months ago
Coincidentally, I just posted about how well LLMs handle adding long strings of numbers: <a href="https:&#x2F;&#x2F;userfriendly.substack.com&#x2F;p&#x2F;discover-how-mistral-large-2-claude" rel="nofollow">https:&#x2F;&#x2F;userfriendly.substack.com&#x2F;p&#x2F;discover-how-mistral-lar...</a>
dan_mctree10 months ago
I&#x27;m curious if we&#x27;ll see a world where computers could solve math problems so easily, that we&#x27;ll be overwhelmed by all the results and stop caring. The role of humans might change to asking the computer interesting questions that we care about.
评论 #41075597 未加载
评论 #41072710 未加载
评论 #41072680 未加载
0xd1r10 months ago
&gt; As part of our IMO work, we also experimented with a natural language reasoning system, built upon Gemini and our latest research to enable advanced problem-solving skills. This system doesn’t require the problems to be translated into a formal language and could be combined with other AI systems. We also tested this approach on this year’s IMO problems and the results showed great promise.<p>Wonder what &quot;great promise&quot; entails. Because it&#x27;s hard to imagine Gemini and other transformer-based models solving these problems with reasonable accuracy, as there is no elimination of hallucination. At least in the generally available products.
评论 #41076300 未加载
skywhopper10 months ago
Except it didn’t. The problem statements were hand-encoded into a formal language by human experts, and even then only one problem was actually solved within the time limit. So, claiming the work was “silver medal” quality is outright fraudulent.
评论 #41071124 未加载
1024core10 months ago
&gt; The system was allowed unlimited time; for some problems it took up to three days. The students were allotted only 4.5 hours per exam.<p>I know speed is just a matter of engineering, but looks like we still have a ways to go. Hold the gong...
stonethrowaway10 months ago
It’s like bringing a rocket launcher to a fist fight but I’d like to use these math language models to find gaps in logic when people are making online arguments. It would be an excellent way to verify who has done their homework.
PaulHoule10 months ago
See <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Automated_Mathematician" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Automated_Mathematician</a> for an early system that seems similar in some way.
评论 #41070440 未加载
arnabgho10 months ago
<a href="https:&#x2F;&#x2F;x.com&#x2F;GoogleDeepMind&#x2F;status&#x2F;1816498082860667086" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;GoogleDeepMind&#x2F;status&#x2F;1816498082860667086</a>
nybsjytm10 months ago
To what extent is the training and structure of AlphaProof tailored specifically to IMO-type problems, which typically have short solutions using combinations of a small handful of specific techniques?<p>(It&#x27;s not my main point, but it&#x27;s always worth remembering - even aside from any AI context - that many top mathematicians can&#x27;t do IMO-type problems, and many top IMO medalists turn out to be unable to solve actual problems in research mathematics. IMO problems are generally regarded as somewhat niche.)
评论 #41072740 未加载
osti10 months ago
So they weren&#x27;t able to solve the combinatorics problem. I&#x27;m not super well versed in competition math, but combinatorics always seem to be the most interesting problems to me.
评论 #41080372 未加载
lo_fye10 months ago
Remember when people thought computers would never be able to beat a human Grand Master at chess? Ohhh, pre-2000 life, how I miss thee.
评论 #41074662 未加载
quirino10 months ago
I honestly expected the IOI (International Olympiad of Informatics) to be &quot;beaten&quot; much earlier than the IMO. There&#x27;s AlphaCode, of course, but on the latest update I don&#x27;t think it was quite on &quot;silver medal&quot; level. And available LLM&#x27;s are probably not even on &quot;honourable mention&quot; level.<p>I wonder if some class of problems will emerge that human competitors are able to solve but are particularly tricky for machines. And which characteristics these problems will have (e.g. they&#x27;ll require some sort of intuition or visualization that is not easily formalized).<p>Given how much of a dent LLM&#x27;s are already making on beginner competitions (AtCoder recently banned using them on ABC rounds [1]), I can&#x27;t help but think that soon these competitions will be very different.<p>[1] <a href="https:&#x2F;&#x2F;info.atcoder.jp&#x2F;entry&#x2F;llm-abc-rules-en" rel="nofollow">https:&#x2F;&#x2F;info.atcoder.jp&#x2F;entry&#x2F;llm-abc-rules-en</a>
评论 #41072251 未加载
ckcheng10 months ago
There doesn’t seem to be much information on how they attempted and failed to solve the combinatorial type problems.<p>Anyone know any details?
评论 #41156924 未加载
pnjunction10 months ago
Brilliant and so encouraging!<p>&gt;because of limitations in reasoning skills and training data<p>One would assume that mathematical literature and training data would be abundant. Is there a simple example that could help appreciate the Gemini bridge layer mentioned in the blog which produces the input for RL in Lean?
jerb10 months ago
Is the score of 28 comparable to the score of 29 here? <a href="https:&#x2F;&#x2F;www.kaggle.com&#x2F;competitions&#x2F;ai-mathematical-olympiad-prize&#x2F;leaderboard" rel="nofollow">https:&#x2F;&#x2F;www.kaggle.com&#x2F;competitions&#x2F;ai-mathematical-olympiad...</a>
评论 #41071872 未加载
评论 #41072337 未加载
djaouen10 months ago
Is it really such a smart thing to train a non-human &quot;entity&quot; to beat humans at math?
brap10 months ago
Are all of these specialized models available for use? Like, does it have an API?<p>I wonder because on one hand they seem very impressive and groundbreaking, on the other it’s hard to imagine why more than a handful of researchers would use them
评论 #41073494 未加载
1110101000110010 months ago
Can anyone comment on how different the AI generated proofs are when compared to those of humans? Recent chess engines have had some &#x27;different&#x27; ideas.
imranhou10 months ago
If the system took 3 days to solve a problem, how different is this approach than a bruteforce attempt at the problem with educated guesses? Thats not reasoning in my mind.
评论 #41080415 未加载
评论 #41080286 未加载
gowld10 months ago
Why is it so hard to make an AI that can translate an informally specified math problem (and Geometry isn&#x27;t even so informal) into a formal representation?
sssummer10 months ago
Why frontier models can both achieve silver medal in Math Olympiad but also fail to answer &quot;which number is bigger, 9.11 or 9.9&quot;?
评论 #41081881 未加载
quantum_state10 months ago
It’s as impressive as if not more than AI beating a chess master. But are we or should we be really impressed?
rowanG07710 months ago
Is this just google blowing up their own asses or is this actually useable with some sane license?
c0l010 months ago
That&#x27;s great, but does that particular model also know if&#x2F;when&#x2F;that it <i>does not</i> know?
评论 #41070266 未加载
评论 #41070361 未加载
评论 #41070160 未加载
atum4710 months ago
Oh, the title was changed to international math Olympiad. I was reading IMO as in my opinion, haha
fovc10 months ago
6 months ago I predicted Algebra would be next after geometry. Nice to see that was right. I thought number theory would come before combinatorics, but this seems to have solved one of those. Excited to dig into how it was done<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39037512">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39037512</a>
lumb6310 months ago
Can someone explain why proving and math problem solving is not a far easier problem for computers? Why does it require any “artificial intelligence” at all?<p>For example, suppose a computer is asked to prove the sum of two even numbers is an even number. It could pull up its list of “things it knows about even numbers”, namely that an even number modulo 2 is 0. Assuming the first number is “a” and the second is “b”, then it knows a=2x and b=2y for some x and y. It then knows via the distributive property that the sum is 2(x+y), which satisfies the definition of an even number.<p>What am I missing that makes this problem so much harder than applying a finite and known set of axioms and manipulations?
评论 #41073603 未加载
评论 #41073548 未加载
评论 #41073690 未加载
评论 #41073692 未加载
评论 #41073596 未加载
mathinaly10 months ago
How do they know their formalization of the informal problems into formal ones was correct?
__0x0110 months ago
Please could someone explain, very simply, what the training data was composed of?
m3kw910 months ago
Is it one of those slowly slowly then suddenly things? I hope so
mupuff123410 months ago
Can it &#x2F; did it solve problems that weren&#x27;t solved yet?
评论 #41072708 未加载
amelius10 months ago
How long until this tech is integrated into compilers?
dmitrygr10 months ago
&gt; First, the problems were manually translated into formal mathematical language<p>That is more than half the work of solving them. Headline should read &quot;AI solves the simple part of each IMA problem at silver medal level&quot;
refulgentis10 months ago
Goalposts at the moon, FUD at &quot;but what if its obviously fake?&quot;.<p>Real, exact, quotes from the top comments at 1 PM EST.<p>&quot;I want to believe that the computer found it, but I can&#x27;t find anything to confirm.&quot;<p>&quot;Curing cancer will require new ideas&quot;<p>&quot;Maybe they used 10% of all of GCP [Google compute]&quot;
gerdesj10 months ago
Why on earth did the &quot;beastie&quot; need the questions translating?<p>So it failed at the first step (comprehension) and hence I think we can request a better effort next time.
badrunaway10 months ago
This will in a few months change everything forever. Exponential growth incoming soon from Deepmind systems.
szundi10 months ago
Like it understands any of it
评论 #41075879 未加载
thoiwer2342310 months ago
And yet it thinks 3.11 is greater than 3.9<p>(probably confused by version numbers)
mik0910 months ago
how long before it solves the last two problems?
lolinder10 months ago
This is a fun result for AI, but a <i>very</i> disingenuous way to market it.<p>IMO contestants aren&#x27;t allowed to bring in paper tables, much less a whole theorem prover. They&#x27;re given two 4.5 hour sessions (9 hours total) to solve all the problems with nothing but pencils, rulers, and compasses [0].<p>This model, meanwhile, was wired up to a theorem proover and took three solid days to solve the problems. The article is extremely light on details, but I&#x27;m assuming that most of that time was guess-and-check: feed the theorem prover a possible answer, get feedback, adjust accordingly.<p>If the IMO contestants were given a theorem prover and three days (even counting breaks for sleeping and eating!), how would AlphaProof have ranked?<p>Don&#x27;t get me wrong, this is a fun project and an exciting result, but their comparison to silver medalists at the IMO is just feeding into the excessive hype around AI, not accurately representing its current state relative to humanity.<p>[0] 5.1 and 5.4 in the regulations: <a href="https:&#x2F;&#x2F;www.imo-official.org&#x2F;documents&#x2F;RegulationsIMO.pdf" rel="nofollow">https:&#x2F;&#x2F;www.imo-official.org&#x2F;documents&#x2F;RegulationsIMO.pdf</a>
评论 #41070648 未加载
评论 #41071491 未加载
评论 #41070740 未加载
评论 #41071393 未加载
评论 #41072178 未加载
gyudin10 months ago
Haha, what a dumb tincan (c) somebody on Twitter right now :D
hendler10 months ago
see also <a href="https:&#x2F;&#x2F;leandojo.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;leandojo.org&#x2F;</a>
machiaweliczny10 months ago
Good, now use DiffusER on algebra somehow please
pmcf10 months ago
I read this as “In My Opinion” and really thought this about AI dealing with opinionated people. Nope. HN is still safe. For now…
Sparkyte10 months ago
In other news today calculator solves math problem.
cs70210 months ago
In 1997, machines defeated a World Chess Champion for the first time, using brute-force &quot;dumb search.&quot; Critics noted that while &quot;dumb search&quot; worked for chess, it might not necessarily be a general strategy applicable to other cognitive tasks.[a]<p>In 2016, machines defeated a World Go Champion for the first time, using a clever form of &quot;dumb search&quot; that leverages compute, DNNs, reinforcement learning (RL), and self-play. Critics noted that while this fancy form of &quot;dumb search&quot; worked for Go, it might not necessarily be a general strategy applicable to other cognitive tasks.[a]<p>In 2024, machines solved insanely hard math problems at the Silver Medal level in an International Math Olympiad for the first time, using a clever form of &quot;dumb search&quot; that leverages compute, DNNs, RL, and a formal language. Perhaps &quot;dumb search&quot; over cleverly pruned spaces isn&#x27;t as dumb as the critics would like it to be?<p>---<p>[a] <a href="http:&#x2F;&#x2F;www.incompleteideas.net&#x2F;IncIdeas&#x2F;BitterLesson.html" rel="nofollow">http:&#x2F;&#x2F;www.incompleteideas.net&#x2F;IncIdeas&#x2F;BitterLesson.html</a>
评论 #41071771 未加载
评论 #41071651 未加载
评论 #41071861 未加载
评论 #41072438 未加载
评论 #41071765 未加载
评论 #41071923 未加载
评论 #41072202 未加载
评论 #41071700 未加载
评论 #41071654 未加载
评论 #41071910 未加载
评论 #41072306 未加载
hulitu10 months ago
&gt; AI solves International Math Olympiad problems at silver medal level<p>&gt; In the official competition, students submit answers in two sessions of 4.5 hours each. Our systems solved one problem within minutes and took up to three days to solve the others.<p>Why not compare with students who are given 3 days to submit an answer ? &#x2F;s
Davidzheng10 months ago
HOLY SHIT. It&#x27;s amazing
data_maan10 months ago
It&#x27;s bullshit. AlphaGeometry can&#x27;t even solve Pythagoras theorem.<p>Not opensourcing anything.<p>This is a dead end on which no further research can be built.<p>It violates pretty much every principle of incremental improvement on which science is based. It&#x27;s here just for hype, and the 300+ comments prove it.
NoblePublius10 months ago
A lot of words to say second place
iamronaldo10 months ago
No benchmarks of any kind?
myspeed10 months ago
This means we may need to remove or replace the Olympiad..It has no practical significance..Winners never contributed to any major scientific breakthroughs.
评论 #41077011 未加载
评论 #41077255 未加载
评论 #41077106 未加载
xyst10 months ago
Billions of dollars spent building this, gW of energy used to train it. And the best it could do is “silver”?<p>Got to be kidding me. We are fucked
AyyEye10 months ago
Parlor tricks. Wake me up AI can reliably identify which number is circled at the level of my two year old.
dinobones10 months ago
I see DeepMind is still playing around with RL + search algorithms, except now it looks like they&#x27;re using an LLM to generate state candidates.<p>I don&#x27;t really find that this impressive. With enough compute you could just do n-of-10,000 LLM generations to &quot;brute force&quot; a difficult problem and you&#x27;ll get there eventually.
评论 #41074312 未加载
piombisallow10 months ago
IMO problems aren&#x27;t fundamentally different from chess or other games, in that the answer is already known.
评论 #41070213 未加载
评论 #41070132 未加载
评论 #41070148 未加载
评论 #41070179 未加载
评论 #41070236 未加载
评论 #41070163 未加载
rich_sasha10 months ago
I&#x27;m actually not that surprised. Maths Olympiads IME have always been 80% preparation, 20% skill - if not more heavily tuned to preparation. It was all about solving as many problems as possible ahead of the papers, and having a good short term memory. Since Olympiads are for kids, the amount of actual fundamental mathematical theorems required is actually not that great.<p>Sounds perfect for a GPT model, with lots of input training data (problem books and solutions).
gowld10 months ago
This shows a major gap in AI.<p>The proofs of these problems aren&#x27;t interesting. They were already known before the AI started work.<p>What&#x27;s interesting is <i>how</i> the AI found the proof. The only answer we have is &quot;slurped data into a neural network, matched patterns, and did some brute search&quot;.<p>What were the ideas it brainstormed? What were the dead-end paths? What were the &quot;activations&quot; where the problem seemed similar to a certain piece of input, which led to a guess of a step in the solution?
balls18710 months ago
What was the total energy consumption required to acheive this result (both training and running)<p>And, how much CO2 was released into earths atmosphere?
评论 #41070391 未加载
评论 #41070448 未加载
评论 #41074771 未加载
评论 #41070733 未加载