TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

What we learned in 6 months of working on an AI Developer

63 点作者 magden大约 1 年前

8 条评论

Lerc大约 1 年前
Even though I don&#x27;t think GPT-4 is up to the task, it does seem like now is the right time to be working on these things. Pretty soon GPT-4 will not be the best in the field. The next generation will perform much better.<p>Possibly the most frustrating thing I find about GPT-4 is how close it gets with it&#x27;s wrong answers. It&#x27;s easy to dismiss a lesser answer when it responds with a laughably out-of-band idea. GPT-4 often shows that it has a general idea of what you want but misses a small but critical aspect which results in a solution to something else that is similar but not what you wanted.<p>I have mixed results on iterating on it&#x27;s own mistakes. It will too often try and change the world to match it&#x27;s answer, rather than fixing the answer. The best approach I have found to stop this is by getting it to create unit tests. I imagine there is a lot of training data for it to understand the intention behind fixing a failing test. It&#x27;s a very specific problem for it to look at and generally changing the test is not considered the correct solution.
评论 #39585240 未加载
评论 #39585238 未加载
amelius大约 1 年前
Until I see an AI sysadmin that can help with basic configure&#x2F;make problems, I don&#x27;t have high hopes for an AI developer.
评论 #39585363 未加载
评论 #39587016 未加载
评论 #39585200 未加载
评论 #39585652 未加载
65大约 1 年前
Maybe AI developers can make landing pages and basic APIs. But, taking front end as an example, I just don&#x27;t see how an AI can reproduce exact design specifications and interactivity to the point where it wouldn&#x27;t just be faster to write the code yourself or search for some human verified snippet that does what you want.<p>And programmers who do know how to actually write efficient code without AI seem like they&#x27;d be even more in demand than those that rely on AI. Skill + knowledge + ability to use existing resources (e.g. StackOverflow, packages, templates), as we do now, are much more predictable and faster than trying to wrangle AI to do exactly what the designer or PM wants.<p>When the dishwasher was invented, everyone thought the human dish washer would be obsolete. And yet, restaurants still employ dish washers because they are much more efficient and thorough than a dishwashing machine.
评论 #39585325 未加载
ctoth大约 1 年前
One of the things they seem to have figured out is the requirement to at least model a sort of actor-critic architecture with their agents. It helps quite a bit.<p>They seem to badmouth Aider a tad (not cool) but I do wonder how a full-stack of this + Aider might work? There needs to also be some sort of good test generator involved.<p>All that said, any time someone actually demonstrates progress on the automated Software Engineer problem and it makes it to HN, I am deeply reminded of the old quote:<p>&quot;It is difficult to get a man to understand something, when his salary depends on his not understanding it.&quot;<p>Just read through this comments section and check out the pure copium. Yes, ChatGPT can do basic sysadmin tasks with .&#x2F;configure and make.<p>Yes it does make sense to work on this now, assuming LLMs will get better, because LLMs have continued to get better on any metric you can imagine.<p>Finally, yes, AI devs will make landing pages and basic APIs. I didn&#x27;t realize we were all hardcore world-class 0.01% programmers? I have certainly written a landing page and basic API before, in fact I do that sort of thing a lot more than I write uber1337 hax0r code. You probably do too!
评论 #39588671 未加载
评论 #39585673 未加载
stevage大约 1 年前
The focus on upfront specs feels a bit off. Since it&#x27;s apparently cheap to generate running code, as a user, I&#x27;d much rather be able to just iterate really fast and use output to refine my requirements rather than having to laboriously state them all up front. Agile rather than waterfall if you will.
评论 #39585420 未加载
评论 #39585856 未加载
gumby大约 1 年前
CMake was invented to guarantee that at least some humans would have software jobs.
somewhereoutth大约 1 年前
&gt; Our approach is to focus on building the application layer instead of working on getting LLMs to output better results. The reasoning is that LLMs will get better,...<p>So more jam tomorrow then. Building the framework around the magic is the easy bit.
评论 #39585017 未加载
wokwokwok大约 1 年前
Hm.<p>It’s easy to look at <a href="https:&#x2F;&#x2F;github.com&#x2F;Pythagora-io&#x2F;gpt-pilot-db-analysis-tool&#x2F;blob&#x2F;main&#x2F;routes&#x2F;developmentSteps.js">https:&#x2F;&#x2F;github.com&#x2F;Pythagora-io&#x2F;gpt-pilot-db-analysis-tool&#x2F;b...</a> and go… so, this new tool means you took two days to write this?<p><i>long stare</i><p>Why did you bother?<p>…but, this both hits the nail on the head and misses the point at the same time.<p>On the one hand, this is foundational tech, prototyping on a new way of doing things. It’s not going to be faster than doing it yourself at first. It won’t run locally at first.<p>On the other hand, we already know that GPT4 level models can do trivial tasks.<p>Over and over and over, people claim coding tools can massively improve productivity, and then try to demo that by building a trivial system.<p>…but building a trivial systems is <i>not the problem</i> that needs solving.<p>The problem that needs solving is building <i>large complex systems</i> with dynamically adjusting requirements.<p>The examples and blog post seem to miss this even as an <i>idea</i>.<p>While I applaud, in general, efforts to explore this space, tackling the easy problems seems like it doesn’t significantly advance the state of play.<p>Here are some concrete things that would be more valuable, but are significantly technically harder:<p>- Use tests. Make it write tests. Make humans write tests. Do not accept generated code that fails the tests.<p>- Focus on refactoring; it’s a known issue that models struggle to refactor code. Breaking your existing code base into tiny files isn’t the answer.<p>- Focus on documenting the behaviour of existing code and incrementally migrating to new behaviour.<p>- Bad developers write new code instead of reading the existing code and using existing functionality and utilities. AI generators are notoriously rubbish at this, and will almost always generate a function rather than use an existing one.<p>Refining and understanding existing code is <i>significantly more valuable</i> than generating code “from scratch”; so much so that I would argue that without the ability to refine existing code, such tools will forever remain in the “scaffold generator” category of “useful but ultimately no better than the current status quo”.<p>The tool as shown, is I believe broadly speaking interesting, but the approach described in the blog (upfront decisions about everything) is a dead end.