TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Are there any objective measurements for AI model coding performance?

2 点作者 charliebwrites2 个月前
Not sure if this is even possible, but is there any site or benchmark for testing which AI model is best for the task of coding?<p>Like Claude 3.5 vs GPT 4o vs Gemini 2 etc<p>What exists beyond our opinions to more objectively measure the quality of code output on these models?

2 条评论

ta06087146522 个月前
The two that I know of are SWE-bench and CodeElo. SWE-bench is oriented towards &quot;real world&quot; performance (resolution of GitHub issues), and CodeElo is oriented towards competitive programming (CodeForces).<p><a href="https:&#x2F;&#x2F;www.swebench.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.swebench.com&#x2F;</a><p><a href="https:&#x2F;&#x2F;codeelo-bench.github.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;codeelo-bench.github.io&#x2F;</a>
gregjor2 个月前
As far as I know we don&#x27;t have any way to objectively measure or compare &quot;quality&quot; or &quot;coding performance&quot; or &quot;best&quot; when looking at code produced by human programmers.<p>You may find this useful:<p><a href="https:&#x2F;&#x2F;www.gitclear.com&#x2F;coding_on_copilot_data_shows_ais_downward_pressure_on_code_quality" rel="nofollow">https:&#x2F;&#x2F;www.gitclear.com&#x2F;coding_on_copilot_data_shows_ais_do...</a><p>Or this analysis if you don&#x27;t want to sign up to download that white paper:<p><a href="https:&#x2F;&#x2F;arc.dev&#x2F;talent-blog&#x2F;impact-of-ai-on-code&#x2F;" rel="nofollow">https:&#x2F;&#x2F;arc.dev&#x2F;talent-blog&#x2F;impact-of-ai-on-code&#x2F;</a>