$10M AI Mathematical Olympiad Prize

281 点作者 jasondavies超过 1 年前

29 条评论

gregdoesit超过 1 年前

I was recently in Palo Alto, and bumped into a newly founded startup (I don't remember the name unfortunately) who set themselves the grand the vision of exactly this: winning a gold medal on the international Olympiad using AI. Their plan was to build mostly on LLMs as a start, and iterate as they go. In their barebones office space, they had a poster with a countdown of the number of weeks till the event: it was 36 at the time.It sounded interesting to wonder how far they could go with this kind of approach. I thought they were aiming for the moon: but also respected the boldness and determination. They had the funding to operate for at least a year, and were very focused to get there.Seems like this prize will supr hundreds (or thousands) of teams competing in exactly this space. Perhaps it will have a similar effect like the $1M Netflix Prize in 2009 for recommendations algorithms!

评论 #38432196 未加载

评论 #38433181 未加载

评论 #38432974 未加载

评论 #38432622 未加载

评论 #38438823 未加载

评论 #38432296 未加载

评论 #38435591 未加载

评论 #38432250 未加载

svat超过 1 年前

How does this relate to the "IMO Grand Challenge" <a href="https://imo-grand-challenge.github.io/" rel="nofollow noreferrer">https://imo-grand-challenge.github.io/</a> ? Is this a new name / formalization, with prize money attached, or is it entirely independent? (E.g. I see Kevin Buzzard and Leonardo de Moura listed on both that page and at <a href="https://aimoprize.com/supporters" rel="nofollow noreferrer">https://aimoprize.com/supporters</a>)

评论 #38432403 未加载

评论 #38442866 未加载

7373737373超过 1 年前

It would be cool to have a Patreon-like system for math proofs. But to reward solvers appropriately and at scale, the award conditions and evaluation would have to be very formalized and specific.This seems to be one potential, actually useful application of blockchains which support general purpose computing - if you can port a proof verifier onto them, you give anyone the ability to commit to (and claim) proof bounties.Now, precisely formalizing specific conjectures and ensuring the proof system is expressive enough but doesn't allow for the introduction of any new assumptions is another problem...

评论 #38432240 未加载

评论 #38432366 未加载

评论 #38433471 未加载

评论 #38433019 未加载

评论 #38432348 未加载

openquery超过 1 年前

Can automated theorem provers solve mathematical olympiad problems in a reasonable time given enough compute?LLMs are quite good at generating semantically correct language. I remember reading a paper about extending the planning capabilities of GPT-4 by using a Planning Domain Definition Language [0]. By that same logic could an LLM not translate the olympiad problem into a form suitable for a theorem prover?[0] <a href="https://arxiv.org/pdf/2305.11014.pdf" rel="nofollow noreferrer">https://arxiv.org/pdf/2305.11014.pdf</a>

评论 #38437727 未加载

评论 #38437704 未加载

andyjohnson0超过 1 年前

As the parent of a young adult currently half way through their maths undergrad, this kind of fills me with foreboding.I know that proof assistants etc have existed for quite a while now, but what with this and the murmours about OAI's Q* model, I do wonder what will happen to maths as a human endeavour - and as a enabling skill for jobs that can financially support people like my child.

评论 #38432226 未加载

评论 #38433414 未加载

评论 #38432223 未加载

评论 #38433677 未加载

评论 #38435638 未加载

评论 #38432194 未加载

评论 #38433119 未加载

评论 #38432395 未加载

alecco超过 1 年前

Very cool of XTX. They are a solid company with very smart people.But wouldn't a model capable of doing this be currently worth hundreds of millions? A billion?

评论 #38433451 未加载

dwrensha超过 1 年前

> $5mn will be awarded to the first publicly-shared AI modelThank for you the emphasis on openness!

Yhippa超过 1 年前

I'm asking this question out of ignorance: if you were able to do this, why would you make it public for $10MM instead of keeping it private and exploiting it. Say, in algorithmic trading models?

评论 #38432927 未加载

评论 #38432866 未加载

评论 #38435148 未加载

评论 #38433365 未加载

评论 #38435680 未加载

seafoam超过 1 年前

Is XTX good for the $10M ?There are a lot of people who got burnt by FTX grants and prizes ...

评论 #38432906 未加载

WhitneyLand超过 1 年前

Winning a gold medal here will really be a leap forward for AI creativity and deep insight into problem solving.AI already is already contributing in substantial ways to research.What’s tantalizing here is this next level would take a huge step toward accelerating the pace of advancements in many fields.It will be a milestone in moving past AI being a mere tool in scientific progress to something much greater.

评论 #38433058 未加载

daxfohl超过 1 年前

I'd still call this fancy autocomplete. I imagine the jump from this, to "come up with an interesting new branch of math to explore", is a long way off.

评论 #38433303 未加载

评论 #38433573 未加载

评论 #38433658 未加载

WhitneyLand超过 1 年前

Before gold at the math Olympiad, it would be nice if GPT 4 could merely evaluate an expression like this into an exact solution :3 + 1 / (3 + 1 / (3 + 1/3))

评论 #38432816 未加载

oglop超过 1 年前

That’s, interesting I think. I’m not sure wha true domain of problems are in that arena. I’ve found great success using LLMs for teaching abstract algebra but have noticed when I switch to analysis or topology things get more wonky.My advice, and I have zero understanding or care why the above happens, is to lean into algebra and train less on analysis. You’ll get there faster it seems.

singularity2001超过 1 年前

Maybe the recently announced 1T Parameter Scientific AI will be trained on mathematical papers as well and picks up reasoning on the fly: <a href="https://news.ycombinator.com/item?id=38391923">https://news.ycombinator.com/item?id=38391923</a>Or at least maybe it will be a good foundation model to be fine-tuned for reasoning?

评论 #38432362 未加载

Imnimo超过 1 年前

It feels like 10M is a fraction of what it would cost to train such a model. Even if we assume an eventual winner would be willing to release the model and forgo potential profits, does this prize really motivate development if it doesn't even cover costs?

dooglius超过 1 年前

Seems better to go by AIME-style scoring (explicit 3-digit number answer) otherwise there's the question of whether an AI really solved the problem or just bullshitted enough relevant keywords to get partial credit.

评论 #38433594 未加载

rvz超过 1 年前

Great, a worthy competition which being first and correctness completely matter and may the best AI model win. Hopefully DeepMind doesn’t enter otherwise they would smoke everyone else other than OpenAI.Speaking of which, I’ll give you a 1% chance of winning with this paper from OpenAI: [0] with the MATH dataset: [1]EDIT: Why the downvotes? I'm trying to help you here and give you a starting point to win the competition? What's wrong with helping others?[0] <a href="https://arxiv.org/abs/2305.20050" rel="nofollow noreferrer">https://arxiv.org/abs/2305.20050</a>[1] <a href="https://github.com/hendrycks/math">https://github.com/hendrycks/math</a>

amai超过 1 年前

Just combine GPT-4 and Wolfram Alpha?See <a href="https://arxiv.org/abs/2308.05713" rel="nofollow noreferrer">https://arxiv.org/abs/2308.05713</a>

daoboy超过 1 年前

Anything you can do with a calculator is obviously trivial.Is there a clear point of departure when AI can no longer handle mathematical reasoning?

评论 #38432126 未加载

评论 #38432078 未加载

评论 #38432653 未加载

评论 #38434661 未加载

supergirl超过 1 年前

not sure there is much point to this. currently no model is even at the level of primary school math. it will take decades until it can win a gold medal in maths. it won't happen overnight. it will happen slowly. will they still give out the prize in 20 years from now, when it won't seem like such a breakthrough?

up2isomorphism超过 1 年前

What is fundamentally different about solving IMO compared to solve any other math problems in AI sense?

queuebert超过 1 年前

Wow, that website has an extremely irritating background animation. Any time your mouse is inside a circle, the circle lights up, stealing your mental focus from the text.Reminds me of the really dumb new Glassdoor design, where it flickers in the background as if it's loading something (but it's not) while you're trying to fill out the sign up form.What is going on with web development these days? Is it some kind of Javascript animation version of the IOCCC?

lupire超过 1 年前

This is an embarrassing overreach.We don't even have models that can win the far easier AMC, let alone AIME, USAMO, and then IMO.

评论 #38435334 未加载

评论 #38437750 未加载

评论 #38435576 未加载

EVa5I7bHFq9mnYK超过 1 年前

Such a system would have a market value on $10billions++, nice try trying to get it for 1/1000 the price.

shannifin超过 1 年前

Assuming one succeeded, would it be more financially lucrative to claim the prize or form a startup with it?

userforcomment超过 1 年前

Great marketing stunt, nice work by XTX HR!

amai超过 1 年前

Just combine GPT-4 and Wolfram Alpha?

arcza超过 1 年前

Of course XTX is sponsoring this

haltist超过 1 年前

We will solve math with math and this project will succeed as foretold by prophets of techno-optimism like Marc Andreesen, Elon Musk, and Bill Gates. All we need is the right architecture, aka mathematical formula for giving all of math a smooth manifold structure. How this works for discrete structures like integers is left as an exercise for the reader and future AI which will figure out how to improve themselves and deal with discrete and non-smooth mathematical problems.I can win this challenge by the way for $80B. I already know what architecture is required to solve math with math but I need the money to buy the GPUs. You might think such a recursive application of math is logically circular but it is not and all I need is $80B to prove it (pun intended).Claude did not like this comment at all, ironically: I do not have enough context to fully evaluate those claims or determine if that approach would work. Solving all of mathematics is an extraordinarily ambitious goal that would require fundamental theoretical advances we do not yet possess. While future AI systems may someday make significant progress on longstanding mathematical problems, making definitive claims about solutions requires rigorous mathematical proof and analysis beyond optimistic speculation. I'd encourage focusing discussion on specific mathematical questions or areas of research rather than making broad, unsupported assertions about solving all of mathematics.

29 条评论

gregdoesit超过 1 年前

评论 #38432196 未加载

评论 #38433181 未加载

评论 #38432974 未加载

评论 #38432622 未加载

评论 #38438823 未加载

评论 #38432296 未加载

评论 #38435591 未加载

评论 #38432250 未加载

svat超过 1 年前

评论 #38432403 未加载

评论 #38442866 未加载

7373737373超过 1 年前

评论 #38432240 未加载

评论 #38432366 未加载

评论 #38433471 未加载

评论 #38433019 未加载

评论 #38432348 未加载

openquery超过 1 年前

评论 #38437727 未加载

评论 #38437704 未加载

andyjohnson0超过 1 年前

评论 #38432226 未加载

评论 #38433414 未加载

评论 #38432223 未加载

评论 #38433677 未加载

评论 #38435638 未加载

评论 #38432194 未加载

评论 #38433119 未加载

评论 #38432395 未加载

alecco超过 1 年前

Very cool of XTX. They are a solid company with very smart people.But wouldn't a model capable of doing this be currently worth hundreds of millions? A billion?

评论 #38433451 未加载

dwrensha超过 1 年前

> $5mn will be awarded to the first publicly-shared AI modelThank for you the emphasis on openness!

Yhippa超过 1 年前

I'm asking this question out of ignorance: if you were able to do this, why would you make it public for $10MM instead of keeping it private and exploiting it. Say, in algorithmic trading models?

评论 #38432927 未加载

评论 #38432866 未加载

评论 #38435148 未加载

评论 #38433365 未加载

评论 #38435680 未加载

seafoam超过 1 年前

Is XTX good for the $10M ?There are a lot of people who got burnt by FTX grants and prizes ...

评论 #38432906 未加载

WhitneyLand超过 1 年前

评论 #38433058 未加载

daxfohl超过 1 年前

I'd still call this fancy autocomplete. I imagine the jump from this, to "come up with an interesting new branch of math to explore", is a long way off.

评论 #38433303 未加载

评论 #38433573 未加载

评论 #38433658 未加载

WhitneyLand超过 1 年前

Before gold at the math Olympiad, it would be nice if GPT 4 could merely evaluate an expression like this into an exact solution :3 + 1 / (3 + 1 / (3 + 1/3))

评论 #38432816 未加载

oglop超过 1 年前

singularity2001超过 1 年前

评论 #38432362 未加载

Imnimo超过 1 年前

dooglius超过 1 年前

评论 #38433594 未加载

rvz超过 1 年前

amai超过 1 年前

Just combine GPT-4 and Wolfram Alpha?See <a href="https://arxiv.org/abs/2308.05713" rel="nofollow noreferrer">https://arxiv.org/abs/2308.05713</a>

daoboy超过 1 年前

Anything you can do with a calculator is obviously trivial.Is there a clear point of departure when AI can no longer handle mathematical reasoning?

评论 #38432126 未加载

评论 #38432078 未加载

评论 #38432653 未加载

评论 #38434661 未加载

supergirl超过 1 年前

up2isomorphism超过 1 年前

What is fundamentally different about solving IMO compared to solve any other math problems in AI sense?

queuebert超过 1 年前

lupire超过 1 年前

This is an embarrassing overreach.We don't even have models that can win the far easier AMC, let alone AIME, USAMO, and then IMO.

评论 #38435334 未加载

评论 #38437750 未加载

评论 #38435576 未加载

EVa5I7bHFq9mnYK超过 1 年前

Such a system would have a market value on $10billions++, nice try trying to get it for 1/1000 the price.

shannifin超过 1 年前

Assuming one succeeded, would it be more financially lucrative to claim the prize or form a startup with it?

userforcomment超过 1 年前

Great marketing stunt, nice work by XTX HR!

amai超过 1 年前

Just combine GPT-4 and Wolfram Alpha?

arcza超过 1 年前

Of course XTX is sponsoring this

haltist超过 1 年前