AutoDev: Automated AI-driven development by Microsoft

163 点作者 saran945大约 1 年前

24 条评论

PodgieTar大约 1 年前

I guess I'd be interested to see how this performs against the same benchmark Devin was using. It's hard to deny that this isn't impressive. But I think there's two interesting parts to it.Claude 3 Opus already scored around 85-86% on these benchmarks, without an "AutoDev" style agentic approach.And all the same problems with HumanEval remain, the limitations in terms of what style of problems are chosen, and real world relevance.I hate writing these styles of comments because I'm acutely aware that a part of me is just worried. Worried about the speed of progress and worried about a changing landscape.But I still wonder how much of this stuff is going to be transferrable to a real life software context.

评论 #39725630 未加载

评论 #39727172 未加载

评论 #39725397 未加载

评论 #39728406 未加载

评论 #39725734 未加载

评论 #39727205 未加载

osigurdson大约 1 年前

This sounds like 164 leetcode like questions. The correlation to actual capability is very tenuous in my opinion.<a href="https://klu.ai/glossary/humaneval-benchmark" rel="nofollow">https://klu.ai/glossary/humaneval-benchmark</a>I'd love if I could 10X / 100X my productivity with AI, but as a heavy ChatGPT user, it's more like a 30% improvement. Awesome, obviously, but looking forward to improvement.

评论 #39770610 未加载

somewhereoutth大约 1 年前

I can see this being used for code much like ChatGPT is used for prose - to generate large amounts of meaningless output of low quality and dubious utility, serving to fulfill pointless metrics at best, and enabling bad actors at worst.As with prose, good human crafted content will always be highly valued and rewarded.

评论 #39725201 未加载

评论 #39786567 未加载

phodal大约 1 年前

I am the author of another version of AutoDev, available at <a href="https://github.com/unit-mesh/auto-dev">https://github.com/unit-mesh/auto-dev</a> , which was developed one year ago with Intellij IDEs.My concept closely resembles Microsoft's AutoDev, but I built it on the Intellij IDEA platform. For instance, it automatically runs tests when created, among other functionalities, can also built with AST, dependency information or other contextTwo weeks ago, I introduced AutoDev DevIns language (which origine name is DevIn from <a href="https://github.com/unit-mesh/auto-dev/issues/101">https://github.com/unit-mesh/auto-dev/issues/101</a> , the another naming issue sotry), which bears similarities to Microsoft's AutoDev. For example:```java /write:src/main/java/com/example/Controller.java#L1-L12 public class Controller { public void method() { System.out.println("Hello, World!"); } } ```As an open-source developer who has created a nearly identical tool, I simply hope that Microsoft considers renaming their product.

dudeinhawaii大约 1 年前

My hopes (and fears) about AI don't seem to match the reality which is that, until we reach AGI, humans are the sole source of creativity and novelty.I think about it like this, would ChatGPT invent Google Search if we had ChatGPT in 2000? Probably not. LLMs seem to exist in this realm of "as smart as an average human with a really big encyclopedia". They're confidently wrong, invent fantasy to defend bad reasoning, and struggle to envision anything outside of their dataset (known things).Ask an AI to construct an entirely new solution to a novel unsolved problem. What always occurs is the AI outputs a generic solution from its dataset that is either half-baked, made-up, or derivative.I'm not even dissing AI, I love AI, but we have yet to see AI apply novel solutions to novel problems.On our current path, AI is not going to dream up the next Uber without Lyft first existing. It won't dream up a new fusion reactor design or an entirely new way to generate cheap energy.But maybe this is perfect! At the moment we have this sweet spot - AI without agency, without awareness, and without superintelligence. This is the kind of AI I want as a household robot or AI driver. This is one I can empathize with but also know that it isn't doing anything at all if I'm not engaging it with a prompt.AGI would mean an ability to have novel solutions and in-turn would be far less stable for society. Where's the line between a mind that has novel thoughts and one that has intrusive thoughts? Maybe your AGI coder isn't content with no-pay and working 24/7. That'll be fun.

评论 #39728616 未加载

评论 #39786708 未加载

croes大约 1 年前

So who is liable if the AI makes severe mistakes?Not that MS has been held accountable for all the security problems they've had recently.

评论 #39729014 未加载

评论 #39726915 未加载

cyberwolf大约 1 年前

Focus:AutoDev: Automates existing development processes and workflows, acting as a productivity booster for developers [2]. Devin: Targets a more independent problem-solving role, potentially including designing software architecture and core functionalities [1]. Collaboration:AutoDev: Designed to integrate with current development teams, with AI agents supporting human developers [2]. Devin: May function more autonomously, potentially needing less direct human oversight than AutoDev [1]. Imagine this analogy:AutoDev: Like a skilled construction assistant, AutoDev automates tasks and streamlines the building process. Devin: Like a talented architect, Devin can design the blueprint and foundation of the software. Here's the exciting part: these AI tools can potentially complement each other:Dream Team: Devin as the architect and AutoDev as the builder could create a highly efficient development process [1]. Complementary Skills: Devin's problem-solving capabilities could be combined with AutoDev's project management expertise for a well-rounded approach [1].

soma7393大约 1 年前

So Microsoft's AutoDev is *Not the unit-mesh one??* or it is? I'm so confused.

_therealtoogy大约 1 年前

Maybe ignorant, but if AI can get to a point of fully automating SWEs, hardly any white-collar knowledge based job is safe.

评论 #39728301 未加载

评论 #39727253 未加载

评论 #39727992 未加载

saran945大约 1 年前

If LLM/Agents contribute to software development, how does the role of software engineers evolve? SE should focus on : - System design - Integrations - Project management etcor the job will disappear in 10 years ?

评论 #39724516 未加载

评论 #39724911 未加载

评论 #39725138 未加载

评论 #39725018 未加载

评论 #39725094 未加载

评论 #39725122 未加载

评论 #39726484 未加载

评论 #39725241 未加载

评论 #39725833 未加载

dmcgill50大约 1 年前

Until a business owner can prompt and get what they want, the industry is still alive. It’ll be more like Who Moved my Cheese than there isn’t any cheese at all.

croes大约 1 年前

Imagine we had this technology 20 years ago and we switched to fully automated web design. I bet all web design would still be deeply nested tables.I doubt that something like React, Vue, Svelte would exist.

评论 #39726640 未加载

评论 #39817491 未加载

评论 #39726742 未加载

评论 #39726334 未加载

baroninthetrees大约 1 年前

I just finished a 30 page system prompt to do the same thing. I did not use Docker though. I can’t wait to see what they have done in detail. I’m sure like 20% of the people here have tried this too, right?

imacomputer大约 1 年前

I'm currently looking for a new job in development... should I be looking for an exit instead?

评论 #39726953 未加载

评论 #39726904 未加载

devnull3大约 1 年前

I don't think so AI will replace SW jobs. It will definitely change the way engineers work.Software development involves:1. Understanding the requirement2. Solving the problem with given constraints (and thereby innovate)3. Talking to stakeholders4. Code (& unit tests)5. Review #46. Troubleshoot in testing7. Troubleshoot in prod, both perf & subtle issues (this is hard)8. Take the input from #6 & #7 and use it as a feedback back to #29. Answering questions from users/support which involve suggesting workarounds and not just factual answers. Suggesting a workaround itself is a mini-problem solving which is an intersection of domain knowledge, knowing code at hand, understanding customer's situation, etc.Coding is hardly taxing and time consuming when there is clarity what needs to be done and how it needs to be done.Point #4 itself has sub-dimensions like performance, maintainability, test-ability, security, etc and it involves lot of subjective calls. Sometimes you have to deal with undocumented behavior of an API which is a tribal knowledge.To troubleshoot in prod (esp subtle issues) require deep knowledge of the code at hand. This itself is a challenge when you are dealing with generated code and something you have not written yourself. Think about a human dealing with an existing large code-base when joining a team.I understand all of the above is a spectrum and there are jobs in SWE which do not require so much rigor.Key ability for breakthrough and the rest will fall in place: Code generated by AI is consistently put to production without human intervention for a sufficiently complex problem considering all good attributes (like backward compatibility, performance, etc)

评论 #39729230 未加载

评论 #39726980 未加载

评论 #39726959 未加载

评论 #39726934 未加载

ThalesX大约 1 年前

I'm a software and product guy; what I don't get, is how we're going to replace us lowly engineers and not a substantial percentage of a company's Human stack.Close to engineering, what about the SCRUM masters? The testers. What about the product owners? What about devops? Further from us, what about the people signing up on our vacation? Or the ones signing up on our daily budgets when traveling, or hell, even the ones we interview with.In my closest group of friends (we're all seniors in our domains, and very honest with eachother), I find that only the construction worker's job should be safe. And compared to myself and the devops guy, most others have what they themselves describe as trivial and bullshit jobs. Join a meeting, do some paper pushing, some document signing, a little coffee and the day is over by 1PM.Am I seeing all these AIs replacing programming because I'm on a board where maybe a lot of us are programmers? Is it the same for other roles? Wouldn't it make more sense to have the interview process automated by LLMs if they're capable of building great software, before we replace those hired?I'm very confused by all the hype when matched with my experience of using LLMs daily for the past years.

评论 #39726218 未加载

评论 #39726198 未加载

评论 #39726709 未加载

评论 #39726238 未加载

评论 #39725899 未加载

评论 #39725986 未加载

评论 #39726944 未加载

评论 #39726560 未加载

avereveard大约 1 年前

IT industry "we cannot use formal methods to create software out of programmable constraints, it would use too much computational power"also IT industry: "it takes 1T flop to compute each token of this program and the result is so unstable that to converge it we need layer and layer of controls over each token group, also obtained by asking the same 1T parameter model."

评论 #39726445 未加载

评论 #39726550 未加载

franticgecko3大约 1 年前

Why are technologists trying so hard to make themselves redundant?This is like the Luddites themselves creating milling machines, eager for the foreman to show them the door.What gives?

评论 #39729354 未加载

评论 #39726491 未加载

评论 #39726513 未加载

评论 #39726427 未加载

评论 #39726660 未加载

评论 #39726522 未加载

评论 #39726547 未加载

评论 #39726559 未加载

yesdocs大约 1 年前

Time to unionize

评论 #39725824 未加载

评论 #39726574 未加载

smokel大约 1 年前

Great achievement, but what a horrible future we are facing.Instead of progressing towards more powerful programming interfaces with less cause for misinterpretation, we are going to automate the silly process of writing redundant unit tests to check if the behavior that we wanted was encoded properly.Why not skip this nonsense and have the code generated from the behavior in the first place?

评论 #39725069 未加载

评论 #39730335 未加载

评论 #39725115 未加载

评论 #39726567 未加载

Mayzie大约 1 年前

Are we at a fifth-generation programming language yet?<a href="https://en.wikipedia.org/wiki/Fifth-generation_programming_language" rel="nofollow">https://en.wikipedia.org/wiki/Fifth-generation_programming_l...</a>

评论 #39726590 未加载

awill88大约 1 年前

It’s been a good run everyone, good luck out there

评论 #39726941 未加载

jaylittle大约 1 年前

It's really depressing to see that big tech is essentially universally pushing Snake Oil. AI is a lie. LLMs are a legit tech that have some purpose, but LLMs will never evolve into anything remotely resembling the actual definition of AI.They are flowery language generators and they will never be able to reason, understand, debate and criticize. They know nothing and therefore embody nothing. No matter how much computer power you waste on them the end result will always be bullshit. Nothing more. Nothing less.To Big Tech I say: Prove me wrong.

评论 #39725601 未加载

评论 #39725442 未加载

评论 #39727397 未加载

评论 #39726586 未加载

phillipcarter大约 1 年前

Oh this is neat, it's based on Visual Studio. Curious how they're accounting for "whoopsie I touched this button and the IDE crashed" kinds of problems that you encounter with larger codebases.

评论 #39726475 未加载

24 条评论

PodgieTar大约 1 年前

评论 #39725630 未加载

评论 #39727172 未加载

评论 #39725397 未加载

评论 #39728406 未加载

评论 #39725734 未加载

评论 #39727205 未加载

osigurdson大约 1 年前

评论 #39770610 未加载

somewhereoutth大约 1 年前

评论 #39725201 未加载

评论 #39786567 未加载

phodal大约 1 年前

dudeinhawaii大约 1 年前

评论 #39728616 未加载

评论 #39786708 未加载

croes大约 1 年前

So who is liable if the AI makes severe mistakes?Not that MS has been held accountable for all the security problems they've had recently.

评论 #39729014 未加载

评论 #39726915 未加载

cyberwolf大约 1 年前

soma7393大约 1 年前

So Microsoft's AutoDev is *Not the unit-mesh one??* or it is? I'm so confused.

_therealtoogy大约 1 年前

Maybe ignorant, but if AI can get to a point of fully automating SWEs, hardly any white-collar knowledge based job is safe.

评论 #39728301 未加载

评论 #39727253 未加载

评论 #39727992 未加载

saran945大约 1 年前

评论 #39724516 未加载

评论 #39724911 未加载

评论 #39725138 未加载

评论 #39725018 未加载

评论 #39725094 未加载

评论 #39725122 未加载

评论 #39726484 未加载

评论 #39725241 未加载

评论 #39725833 未加载

dmcgill50大约 1 年前

Until a business owner can prompt and get what they want, the industry is still alive. It’ll be more like Who Moved my Cheese than there isn’t any cheese at all.

croes大约 1 年前

评论 #39726640 未加载

评论 #39817491 未加载

评论 #39726742 未加载

评论 #39726334 未加载

baroninthetrees大约 1 年前

imacomputer大约 1 年前

I'm currently looking for a new job in development... should I be looking for an exit instead?

评论 #39726953 未加载

评论 #39726904 未加载

devnull3大约 1 年前

评论 #39729230 未加载

评论 #39726980 未加载

评论 #39726959 未加载

评论 #39726934 未加载

ThalesX大约 1 年前

评论 #39726218 未加载

评论 #39726198 未加载

评论 #39726709 未加载

评论 #39726238 未加载

评论 #39725899 未加载

评论 #39725986 未加载

评论 #39726944 未加载

评论 #39726560 未加载

avereveard大约 1 年前

评论 #39726445 未加载

评论 #39726550 未加载

franticgecko3大约 1 年前

Why are technologists trying so hard to make themselves redundant?This is like the Luddites themselves creating milling machines, eager for the foreman to show them the door.What gives?

评论 #39729354 未加载

评论 #39726491 未加载

评论 #39726513 未加载

评论 #39726427 未加载

评论 #39726660 未加载

评论 #39726522 未加载

评论 #39726547 未加载

评论 #39726559 未加载

yesdocs大约 1 年前

Time to unionize

评论 #39725824 未加载

评论 #39726574 未加载

smokel大约 1 年前

评论 #39725069 未加载

评论 #39730335 未加载

评论 #39725115 未加载

评论 #39726567 未加载

Mayzie大约 1 年前

评论 #39726590 未加载

awill88大约 1 年前

It’s been a good run everyone, good luck out there

评论 #39726941 未加载

jaylittle大约 1 年前

评论 #39725601 未加载

评论 #39725442 未加载

评论 #39727397 未加载

评论 #39726586 未加载

phillipcarter大约 1 年前

Oh this is neat, it's based on Visual Studio. Curious how they're accounting for "whoopsie I touched this button and the IDE crashed" kinds of problems that you encounter with larger codebases.

评论 #39726475 未加载