I was recently in Palo Alto, and bumped into a newly founded startup (I don't remember the name unfortunately) who set themselves the grand the vision of exactly this: winning a gold medal on the international Olympiad using AI. Their plan was to build mostly on LLMs as a start, and iterate as they go. In their barebones office space, they had a poster with a countdown of the number of weeks till the event: it was 36 at the time.<p>It sounded interesting to wonder how far they could go with this kind of approach. I thought they were aiming for the moon: but also respected the boldness and determination. They had the funding to operate for at least a year, and were very focused to get there.<p>Seems like this prize will supr hundreds (or thousands) of teams competing in exactly this space. Perhaps it will have a similar effect like the $1M Netflix Prize in 2009 for recommendations algorithms!
How does this relate to the "IMO Grand Challenge" <a href="https://imo-grand-challenge.github.io/" rel="nofollow noreferrer">https://imo-grand-challenge.github.io/</a> ? Is this a new name / formalization, with prize money attached, or is it entirely independent? (E.g. I see Kevin Buzzard and Leonardo de Moura listed on both that page and at <a href="https://aimoprize.com/supporters" rel="nofollow noreferrer">https://aimoprize.com/supporters</a>)
It would be cool to have a Patreon-like system for math proofs. But to reward solvers appropriately and at scale, the award conditions and evaluation would have to be very formalized and specific.<p>This seems to be one potential, actually useful application of blockchains which support general purpose computing - if you can port a proof verifier onto them, you give anyone the ability to commit to (and claim) proof bounties.<p>Now, precisely formalizing specific conjectures and ensuring the proof system is expressive enough but doesn't allow for the introduction of any new assumptions is another problem...
Can automated theorem provers solve mathematical olympiad problems in a reasonable time given enough compute?<p>LLMs are quite good at generating semantically correct language. I remember reading a paper about extending the planning capabilities of GPT-4 by using a Planning Domain Definition Language [0]. By that same logic could an LLM not translate the olympiad problem into a form suitable for a theorem prover?<p>[0] <a href="https://arxiv.org/pdf/2305.11014.pdf" rel="nofollow noreferrer">https://arxiv.org/pdf/2305.11014.pdf</a>
As the parent of a young adult currently half way through their maths undergrad, this kind of fills me with foreboding.<p>I know that proof assistants etc have existed for quite a while now, but what with this and the murmours about OAI's Q* model, I do wonder what will happen to maths as a human endeavour - and as a enabling skill for jobs that can financially support people like my child.
Very cool of XTX. They are a solid company with very smart people.<p>But wouldn't a model capable of doing this be currently worth hundreds of millions? A billion?
I'm asking this question out of ignorance: if you were able to do this, why would you make it public for $10MM instead of keeping it private and exploiting it. Say, in algorithmic trading models?
Winning a gold medal here will really be a leap forward for AI creativity and deep insight into problem solving.<p>AI already is already contributing in substantial ways to research.<p>What’s tantalizing here is this next level would take a huge step toward accelerating the pace of advancements in many fields.<p>It will be a milestone in moving past AI being a mere tool in scientific progress to something much greater.
I'd still call this fancy autocomplete. I imagine the jump from this, to "come up with an interesting new branch of math to explore", is a long way off.
Before gold at the math Olympiad, it would be nice if GPT 4 could merely evaluate an expression like this into an exact solution :<p>3 + 1 / (3 + 1 / (3 + 1/3))
That’s, interesting I think. I’m not sure wha true domain of problems are in that arena. I’ve found great success using LLMs for teaching abstract algebra but have noticed when I switch to analysis or topology things get more wonky.<p>My advice, and I have zero understanding or care why the above happens, is to lean into algebra and train less on analysis. You’ll get there faster it seems.
Maybe the recently announced 1T Parameter Scientific AI will be trained on mathematical papers as well and picks up reasoning on the fly:
<a href="https://news.ycombinator.com/item?id=38391923">https://news.ycombinator.com/item?id=38391923</a><p>Or at least maybe it will be a good foundation model to be fine-tuned for reasoning?
It feels like 10M is a fraction of what it would cost to train such a model. Even if we assume an eventual winner would be willing to release the model and forgo potential profits, does this prize really motivate development if it doesn't even cover costs?
Seems better to go by AIME-style scoring (explicit 3-digit number answer) otherwise there's the question of whether an AI really solved the problem or just bullshitted enough relevant keywords to get partial credit.
Great, a worthy competition which being first and correctness completely matter and may the best AI model win. Hopefully DeepMind doesn’t enter otherwise they would smoke everyone else other than OpenAI.<p>Speaking of which, I’ll give you a 1% chance of winning with this paper from OpenAI: [0] with the MATH dataset: [1]<p>EDIT: Why the downvotes? I'm trying to help you here and give you a starting point to win the competition? What's wrong with helping others?<p>[0] <a href="https://arxiv.org/abs/2305.20050" rel="nofollow noreferrer">https://arxiv.org/abs/2305.20050</a><p>[1] <a href="https://github.com/hendrycks/math">https://github.com/hendrycks/math</a>
Just combine GPT-4 and Wolfram Alpha?<p>See <a href="https://arxiv.org/abs/2308.05713" rel="nofollow noreferrer">https://arxiv.org/abs/2308.05713</a>
not sure there is much point to this. currently no model is even at the level of primary school math. it will take decades until it can win a gold medal in maths. it won't happen overnight. it will happen slowly. will they still give out the prize in 20 years from now, when it won't seem like such a breakthrough?
Wow, that website has an extremely irritating background animation. Any time your mouse is inside a circle, the circle lights up, stealing your mental focus from the text.<p>Reminds me of the really dumb new Glassdoor design, where it flickers in the background as if it's loading something (but it's not) while you're trying to fill out the sign up form.<p>What is going on with web development these days? Is it some kind of Javascript animation version of the IOCCC?
We will solve math with math and this project will succeed as foretold by prophets of techno-optimism like Marc Andreesen, Elon Musk, and Bill Gates. All we need is the right architecture, aka mathematical formula for giving all of math a smooth manifold structure. How this works for discrete structures like integers is left as an exercise for the reader and future AI which will figure out how to improve themselves and deal with discrete and non-smooth mathematical problems.<p>I can win this challenge by the way for $80B. I already know what architecture is required to solve math with math but I need the money to buy the GPUs. You might think such a recursive application of math is logically circular but it is not and all I need is $80B to prove it (pun intended).<p><i>Claude did not like this comment at all, ironically</i>: I do not have enough context to fully evaluate those claims or determine if that approach would work. Solving all of mathematics is an extraordinarily ambitious goal that would require fundamental theoretical advances we do not yet possess. While future AI systems may someday make significant progress on longstanding mathematical problems, making definitive claims about solutions requires rigorous mathematical proof and analysis beyond optimistic speculation. I'd encourage focusing discussion on specific mathematical questions or areas of research rather than making broad, unsupported assertions about solving all of mathematics.