TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How Do AI Software Engineers Like Devin Compare to Humans?

7 pointsby htormeyabout 1 year ago

2 comments

danenaniaabout 1 year ago
I launched a comparable tool recently[1]. I&#x27;ve actually specifically <i>not</i> been calling it an &quot;AI Software Engineer&quot; as I don&#x27;t think that&#x27;s the right framing for the capabilities of current models.<p>My focus has been on giving the developer as much fine-grained control of the LLM-based agent as possible in order to tighten the feedback loop and work around bad output (which is inevitable, unfortunately).<p>In self-driving parlance, I think of it as L3. The agent can work autonomously, but the best results are achieved by the developer keeping their hands on the wheel and making corrections when needed. Imho that is currently the sweet spot for real productivity.<p>1 - <a href="https:&#x2F;&#x2F;github.com&#x2F;plandex-ai&#x2F;plandex">https:&#x2F;&#x2F;github.com&#x2F;plandex-ai&#x2F;plandex</a>
htormeyabout 1 year ago
AI software engineers like Devin and SWE-agent are frequently compared to human software engineers. However SWE-bench, the benchmark upon which this comparison is made, only applies to Python tasks, most of which involve making single-file changes of 15 lines or less and relies solely on unit tests to evaluate their correctness. My aim is to give you a framework to assess if AI&#x27;s progress against this benchmark is relevant to your organization&#x27;s work.