TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

GPT-4o hit 54% accuracy on CodeContests with AlphaCodium, up from 48% for GPT-4T

25 点作者 GavCo大约 1 年前

1 comment

GavCo大约 1 年前
It&#x27;s interesting that with direct prompting there&#x27;s only a 1 point difference between GPT-4o and GPT-4 turbo, but with the AlphaCodium flow it becomes a substantial 6 point difference.<p>AlphaCodium works by decomposing a competitive programming problem into simple steps and has an automated flow that uses the LLM for each step. It&#x27;s iterative, so compilation errors and test results are fed back into the model and the model can fix mistakes.<p>IMO this is a much more useful benchmark than a typical eval because it reflects how LLMs are actually used in the real world. It seems like it also surfaces subtle differences in reasoning abilities that zero-shot evals don&#x27;t capture.
评论 #40396890 未加载