TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

GPT-4o hit 54% accuracy on CodeContests with AlphaCodium, up from 48% for GPT-4T

25 pointsby GavCoabout 1 year ago

1 comment

GavCoabout 1 year ago
It&#x27;s interesting that with direct prompting there&#x27;s only a 1 point difference between GPT-4o and GPT-4 turbo, but with the AlphaCodium flow it becomes a substantial 6 point difference.<p>AlphaCodium works by decomposing a competitive programming problem into simple steps and has an automated flow that uses the LLM for each step. It&#x27;s iterative, so compilation errors and test results are fed back into the model and the model can fix mistakes.<p>IMO this is a much more useful benchmark than a typical eval because it reflects how LLMs are actually used in the real world. It seems like it also surfaces subtle differences in reasoning abilities that zero-shot evals don&#x27;t capture.
评论 #40396890 未加载