Devin: AI Software Engineer

530 pointsby neural_thingabout 1 year ago

79 comments

Although the demos are impressive, they seem short and limited in scope which makes me wonder how well this will work outside of these planned cases. Can it do software architecture at all? Is it still essentially just regurgitating solutions? How often will the solution only be 90% correct, which is 100% not good enough?Even so, I realize the demos are still broad in scope and the results are incredible. Imagine seeing this even 2 years ago. It would seem like magic; you wouldn't be able to believe it. Today, this was inevitable and entirely believable. There will be even better versions of this soon.

评论 #39681623 未加载

评论 #39683229 未加载

评论 #39681648 未加载

nsypterasabout 1 year ago

Clearly an extremely impressive demo and congrats on the launch. I do wonder how often the bugs Devin encounters will be solvable from the simple fixes that were demonstrated. For instance, I notice in the first demo Devin hits a KeyError and decides to resolve it by wrapping the code in a try-catch. While this will get the code to run, I immediately imagined cases where it's not actually an ideal solution (maybe it's a KeyError because the blog post Devin read is incorrect or out of date and Devin should actually be referencing a different key altogether or a different API). Can Devin "back up" at this point and implement a fix further back in its "decision tree" (e.g. use a different API endpoint) or can it only come up with fixes for the specific problem it's encountering at this moment (catch the KeyError and return None)?

评论 #39683480 未加载

ThalesXabout 1 year ago

As a developer but also product person, I keep trying to use AI to code for me. I keep failing, because of context length, because of shit output from the model, because of lack of any kind of architecture etc etc etc. I'm probably dumb as hell, because I just can't get it to do anything remotely useful, more than helping me with leetcode.Just yesterday I tried to feed it a simple HTML page to extract a selector, I tried it with GPT-4-turbo, I tried it with Claude, I tried it with Groq, I tried it with a local LLama2 model with 128k context window. None of them worked. This is a task that while annoying, I do in about 10 seconds.Sure, I'm open to the possibility that in the next 2 - 3 days up to a couple of years, I'll no longer do manual coding. But honestly. After so much hype, I'm starting to grow a bit irritated with the hype.Just give me a product that works as advertised and I'll throw money your way because I have a lot more ideas than I have code throughoutput!

评论 #39684532 未加载

评论 #39682382 未加载

评论 #39683807 未加载

评论 #39683627 未加载

评论 #39682918 未加载

评论 #39685104 未加载

评论 #39685257 未加载

评论 #39683025 未加载

评论 #39683441 未加载

评论 #39684737 未加载

评论 #39685513 未加载

评论 #39689584 未加载

评论 #39683543 未加载

评论 #39685028 未加载

评论 #39683982 未加载

评论 #39683140 未加载

评论 #39684531 未加载

评论 #39683874 未加载

评论 #39683479 未加载

评论 #39684323 未加载

评论 #39685364 未加载

评论 #39684785 未加载

评论 #39683822 未加载

评论 #39687581 未加载

评论 #39687356 未加载

评论 #39682770 未加载

the_newestabout 1 year ago

While impressive, the demo on UpWork didn’t even come close to fulfilling the job requirements. The job asked for instructions on how to set it up on an EC2 machine. It didn’t ask to run the model, or do anything that was depicted.It makes me question the truthfulness of the other claims.

YeGoblynQueenneabout 1 year ago

>> With our advances in long-term reasoning and planning, Devin can plan and execute complex engineering tasks requiring thousands of decisions.They'd better have really advanced reasoning and planning capabilities way beyond everything that anyone else knows how to do with LLMs. There's a growing body of literature that leaves no doubt that LLMs can't reason and can't plan.For a quick summary of some such results see:<a href="https://arxiv.org/pdf/2403.04121.pdf" rel="nofollow">https://arxiv.org/pdf/2403.04121.pdf</a>

评论 #39685571 未加载

pushedxabout 1 year ago

Scott Wu! I met Scott at a competitive programming event a few years back.He is one of a very small group of people (going back to 1989) to get a perfect raw score at the IoI, the olympiad for competitive programming.<a href="https://stats.ioinformatics.org/people/2686" rel="nofollow">https://stats.ioinformatics.org/people/2686</a>Glad to see that he's putting his (unbelievable) talents to use. To give you a sense, at the event where I met him, he solved 6 problems equivalent to Leetcode medium-to-hard problems in under 15 minutes (total), including reading the problems, implementing input parsing, debugging, and submitting the solutions.

评论 #39688754 未加载

评论 #39684572 未加载

PodgieTarabout 1 year ago

I must say, I'm not HUGELY impressed with a website that lets me, unauthenticated, upload files of an arbitrary size. Just posted a 500mb dmg file to their server.If anyone is practicing for their B1 Dutch exam, feel free to use this link to get the practice paper.<a href="https://usacognition--serve-s3-files.modal.run/attachments/460be415-1283-4963-9a52-931ad509afa4/2020%20Lezen%20I%20openbaar%20examen%20tekstboekje%20(digitaal).pdf" rel="nofollow">https://usacognition--serve-s3-files.modal.run/attachments/4...</a>

评论 #39686530 未加载

评论 #39686927 未加载

dakiolabout 1 year ago

Don't get it. If we have this amazing AI why don't we make good use of it? 90% of my job is not to write code (as a senior software engineer), is to:- deobfuscate complex requirements into well divided chunks- find gaps or holes in requirements so that I have to write the minimal amount of code- understand codebases so that the implementation fits nicelyI don't need an "AI software engineer", I need an "AI people person who gives me well defined tasks". Now sure, if you combine those two kinds of AIs I could perhaps become irrelevant.

评论 #39691421 未加载

评论 #39684586 未加载

评论 #39725872 未加载

评论 #39686615 未加载

评论 #39725866 未加载

mlsuabout 1 year ago

After devin "figures out" 10 issues, what does the code look like? Those are the easy ones, and if you haven't fixed them cleanly, the next 10 will be more difficult to solve, for human and for robot. Now do this for several years. Can devin create its own bug reports and issues? It better be able to!I'm curious what a large, mature codebase, with complex internals and legacy code looks like after you sick devin on it. Not pretty I suspect. In fact, I think it will become so difficult to fix that nobody -- neither human nor devin -- will be able to clean up the mess. By sheer volume, a broken ball of unfixable spaghetti.I would be immensely pissed off if someone did this to an open source project of mine, or even to a closed-source codebase I'm working on. Not only would it not be useful, it would be moving backwards. Creating an icky vomit mess that we will probably have to spend years cleaning up after bug reports and complaints from customers begin mounting, and competitors can iterate faster.Does that sound like something you want to deal with in your software business?

评论 #39687186 未加载

RyEgswuCsnabout 1 year ago

If you need AI to help you program an algorithm, then you shouldn't be using it because you can't tell if AI's solution is correct.If you can tell if a solution is correct or not --- well, then you don't need to have AI write it for you.I think AI programming can only work when the industry begin to treat "almost working" systems backed by human customer service as acceptable.

评论 #39685420 未加载

评论 #39684486 未加载

Orasabout 1 year ago

From their twitter:> When evaluated on the SWE-Bench benchmark, which asks an AI to resolve GitHub issues found in real-world open-source projects, Devin correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted and 4.80% assisted.While it is a progress, its far away from being useful to be a software engineer.

评论 #39684196 未加载

评论 #39684094 未加载

评论 #39684114 未加载

goat_whispererabout 1 year ago

People who try to draw historical analogies to AI replacing humans say things like:"cars replaced horse drawn carriages. But we managed to adapt to that, the carriage drivers got new jobs."My dudes. We are the HORSES being replaced in this scenario.

评论 #39682900 未加载

评论 #39688867 未加载

pankajdohareyabout 1 year ago

So why are they hiring? <a href="https://jobs.ashbyhq.com/cognition">https://jobs.ashbyhq.com/cognition</a> cant they just use "Devin" ?

aster0idabout 1 year ago

I have a few years of experience in backend development, and I have realized that LLMs are incredible productivity boosts for generating code only if you know the underlying libraries/frameworks/languages very well. You can then prompt it with very specific instructions and it can go do that. Helps with the typing, but that's pretty much all. I still have to know everything and it can definitely not do everything on autopilot. I would be surprised if this product can do any real work.

评论 #39684546 未加载

devineganabout 1 year ago

Have I been replaced? AI coming for my job and now my name!

评论 #39685434 未加载

HarHarVeryFunnyabout 1 year ago

Let's get realistic here - I just beat GPT-4 at tic tac toe, since it failed to block my 2/3 complete winning line ...Sure, one day we'll have AGI, and one day AGI will replace many jobs that can be done in front of a computer.In the meantime, SOTA AI appears to be an airline chatbot that gets the company sued for lying to the customer. This is just basic question answering, and it can't even get that right. Would you trust it to write the autopilot code to fly the airplane? Maybe to write a tiny bit of it - just code up one function, perhaps?I sure as hell wouldn't, and when it can be trusted to write one function that meets requirements and has no bugs, it's still going to be a LONG way before it can replace the job of the developers who were given a task of "write us an autopilot".

评论 #39688264 未加载

singularity2001about 1 year ago

Interesting: The last demo on the blog took 2.5h to complete: <a href="https://www.cognition-labs.com/blog" rel="nofollow">https://www.cognition-labs.com/blog</a> <a href="https://www.youtube.com/watch?v=UTS2Hz96HYQ" rel="nofollow">https://www.youtube.com/watch?v=UTS2Hz96HYQ</a> "Devin's Upwork Side Hustle"I wonder how much time of this was consumed by manually directing Devin into the right direction, manually fixing and undoing the mess Devin produced and watching Devin burn through $$$. As others said, being completely non-transparent about this burns a bit of trust, but I'd really like to know where we are right now. Since Devin is currently "invite only demos", a more realistic peek into the state of the art can be seen here: <a href="https://docs.sweep.dev/blogs/gpt-4-modification">https://docs.sweep.dev/blogs/gpt-4-modification</a>My gut feeling (and limited experience): gpt-4 and other models are not quite there yet, but whoever prepares for the next generation of models now will eventually win big times. Or be replaced by simpler approaches.

评论 #39684241 未加载

StickyRibbsabout 1 year ago

I've worked on very complex systems - The disney streaming platform (before it was disney), live video streaming systems, banking transaction systems, your run of the mill crud software with kafka clusters piping mind numbing amounts of data, netflix and a few other large engineering heavy companies.No engineering company worth their weight is going to build a world class technology business purely with generative AI in its current state. The risk in doing so currently is total and utter failure. I have a very hard time believing we're any where near that capability. Maybe your mom and pop startup could hire a prompt engineer to build a website and simple tool but we have yet to see those exercises surfaced to the mainstream; it's purely speculative.I say, rest easy programmers. Your careers will be enriched more than axed with generative AI as a support tool for many years to come.Also, if anyone who works in this field has a strong opposing belief, then consider OpenAI engineers are programming themselves out of a job which obviously is not the case.

mellosoulsabout 1 year ago

Looks interesting but claiming "first" seems pretty off, there have been others like Sweep featured here before.<a href="https://news.ycombinator.com/item?id=36987454">https://news.ycombinator.com/item?id=36987454</a>Sweep is an open-source AI-powered junior developer<a href="https://sweep.dev/">https://sweep.dev/</a>

评论 #39683721 未加载

mattlondonabout 1 year ago

There is no way this is going to make it so that "engineers can focus on more interesting problems and engineering teams can strive for more ambitious goals."Instead it will mean that bosses can fire 75-90% of the (very expensive) engineers, with the ones who remain left to prompt the AI and clean up any mistakes/misunderstandings.I guess this is the future. We've coded ourselves out of a job. People are smiling and celebrating this all - personally I find it kinda sad that we've basically put an end to software engineering as a career and put loads of people out of work. it is not just SWEs - it is impacting a lot of careers... I hope these researchers can sleep well at night because they're dooming huge swathes of people to unemployment.Are we about to enter a software engineering winter? People will find new careers, no kids will learn to code since AI can do it all. We'll end up with a load of AI researchers being "the new SWEs", but relying on AI to implement everything? Maybe that will work and we'll have a virtuous circle of AIs making AI improvements and we'll never need engineers again? Or maybe we'll hit a wall and progress in comp sci will essentially stop?

评论 #39681829 未加载

评论 #39681604 未加载

评论 #39682639 未加载

评论 #39681516 未加载

评论 #39681701 未加载

评论 #39681509 未加载

评论 #39681469 未加载

评论 #39681807 未加载

评论 #39682262 未加载

评论 #39684282 未加载

评论 #39681481 未加载

评论 #39681949 未加载

评论 #39682866 未加载

评论 #39681542 未加载

评论 #39683286 未加载

评论 #39681536 未加载

评论 #39681500 未加载

评论 #39682020 未加载

评论 #39681518 未加载

评论 #39683020 未加载

评论 #39682590 未加载

评论 #39681576 未加载

评论 #39683143 未加载

评论 #39681489 未加载

评论 #39681568 未加载

评论 #39684183 未加载

评论 #39681676 未加载

senkoabout 1 year ago

As someone who works in this space (<a href="https://pythagora.ai">https://pythagora.ai</a>), I welcome new entrants to this niche.Currently, mainstream AI usage in coding is at the level of assistants and glorified autocomplete. Which is great (I use GitHub Copilot daily), but for us working in the space it's obvious that the impact will be much larger. Besides us (Pythagora), there's also Sweep (mentioned by others in the comments) and GPT Engineer who are tackling the same problem, I believe each with a slightly different angle.Our thesis is that human in the loop is key. In coding, you can think of LLMs as a very eager junior developer who can easily read StackOverflow but doesn't really think twice before jumping to implementation. With guidance (a LOT in terms of internal prompts, and some by human) it can achieve spectacular results.

评论 #39682805 未加载

评论 #39685785 未加载

devinpraterabout 1 year ago

Hey, they named it after me!

评论 #39683504 未加载

lacooljabout 1 year ago

This is just nice packaging on top of current models. Very nicely done but still not a giant leap forward from what is already here

评论 #39682031 未加载

jasfiabout 1 year ago

Is Devin a new LLM? Perhaps equiped with code and deploy plug-ins? The comparisons against other LLMs would suggest so.The real world eval benchmark puts Claude 2 way ahead of GPT-4, which doesn't sound right.

评论 #39683066 未加载

matthewsinclairabout 1 year ago

We’re still at the “rhyming not reasoning” phase of LLMs. The question of whether we move past rhyming and onto reasoning is a good one, and I’m not sure what I think about it. But I am pretty sure that coding is a lot more like reasoning than it is like rhyming, at least for de novo problems above a certain level of complexity (intellectual challenge) and complication (moving parts).I remain open minded about what’s next and at the rate things are changing, I wouldn’t rule anything out a priori for now.

pedalpeteabout 1 year ago

I'd really like it if Cognition Labs would put the resulting code from the demo into an open-source repository so we could examine it directly.When I was using chatGPT to help guide me through some coding tasks, I'd find it could create somewhat useful code, but where it fell down was that it would put things into variables which would be better put into a class. It is this structuring of a complete system which is important for any real software engineering, rather than just writing code.

评论 #39685815 未加载

hiddencostabout 1 year ago

"first" lol.Making false, grandiose claims like that burns a lot of trust.Focus on execution and quality.

DrAgOn200233about 1 year ago

I believe that hosting DEVIN will cost much more GPU time hosting a regular LLM. By inspecting the videos in Cognition Lab's official website, I noticed that DEVIN can take more than one hour to do one step, which is more than an hour of GPU usage. When using GPT-4, we usually get output within 30 seconds, which is less than a minute of GPU usage.In addition, when using GPT-4, I use it only when I have new thoughts, so the GPU occupancy rate is low. I probably use less than 5 hours of GPU time each month. DEVIN is sort of like an intern working for you, so you would probably at least make it work 40hrs/week.These difference in GPU usage would probably make DEVIN 10 times more expensive for the business model to be profitable, that is, if they are using the subscriber business model like GPT-4.I don't think there are any other viable business model for DEVIN - for sure it cannot replace or even reduce the number of human programmer due to LLM's unreliable nature and the necessity of code verification.

Havocabout 1 year ago

Surprised how calm and underwhelmed comments are.Sure it is no senior architect but the trajectory is insane. Wasn’t that long ago that LLMs barely managed coherent poems. Now it’s troubleshooting code problems on its own?Sure it’s just a gpt4 wrapper but that implies the same can be done with gpt5 and six etc.Project it forward and that does actually become non trivial

评论 #39697995 未加载

epolanskiabout 1 year ago

I really don't like these announcements with invitation lists.Just let me try the goddamn product.By the time you let me in, I don't care anymore or another competitor catched my attention already.Neon, the Postgres as a service put me in such a long wait list that by the time they invited me in, I was already on a completely different solution (and was happy).

devinthenaiabout 1 year ago

Also see Devin the NAI for an old school alternative: <a href="https://docs.google.com/document/d/1byJgu1G_M58QVWmpZeEDthyAB8Bq752RgL_gF7hEJQc/edit?usp=drivesdk" rel="nofollow">https://docs.google.com/document/d/1byJgu1G_M58QVWmpZeEDthyA...</a>

ein0pabout 1 year ago

I know it’s a rigged demo because they pretend AI was able to figure out their broken CUDA situation. :-)

gerashabout 1 year ago

we still don't have agents that can do simple things like: find a funny photo of my dog in my phone and post it as a story on instagram with 100% reliability. I would wait for that to happen first before thinking there can be an autonomous software engineer

评论 #39685790 未加载

LZ_Khanabout 1 year ago

Hey! Stop taking our jobs!Side note: I'm kind of offended that something called 'Devin' is going to take my job. If you're going to replace me at least let me keep my dignity by naming it something cool like 'Sora'

rafadcabout 1 year ago

Prepare for a lot of copycat companies. Hey Devin, copy this company's software.

评论 #39681660 未加载

评论 #39681767 未加载

评论 #39681581 未加载

评论 #39687238 未加载

bachittleabout 1 year ago

I recommend looking at swe-bench to get an idea as to what breakthroughs this product accomplishes: <a href="https://www.swebench.com/" rel="nofollow">https://www.swebench.com/</a>. They claim to have tested SOTA models like GPT-4 and Claude 2 (I would like to see it tested on Claude 3 Opus) and their score is 13.86% as opposed to 4.80% for Claude 2. This benchmark is for solving real-world GitHub issues. So for those claiming that they tried models in the past and it didn't work for their use case, maybe this one will be better?

ramozabout 1 year ago

Bearish. These types of tools/agents-chaining will be irrelevant due to lackluster capability until AGI is achieved. At which point, the basis for creating these types of tools/agents will be defunct.

swaxabout 1 year ago

I've been working on something similar, here's one of their same tests where the AI learns how to make a hidden text image.<a href="https://www.youtube.com/watch?v=dHlv7Jl3SFI" rel="nofollow">https://www.youtube.com/watch?v=dHlv7Jl3SFI</a>The real problem is coherence (logic and consistency over time) which is what these wrappers try to address. I believe AI could probably be trained to be a lot more coherent out of the box.. working with minimal wrapping.. that is the AI I worry about.

MichaelRazumabout 1 year ago

This is awesome to bootstrap some ideas. The question is can it work with (large) existing code bases or modify it's own code. Guess a good test would be, can it reproduce Devin;)

playmkrabout 1 year ago

- first AI developer - raised $21 million - uses google forms for the onboardingok

huimangabout 1 year ago

When you have software in prod failing because it was built by shoddy "AI" and people who copy/paste because they don't know any better, and you need a fix, give me a ring.I have tried using GPT4 & gemini extensively, and the amount of bullshit generated makes it unreliable if you don't already know the domain. These tools lack the critical stuff (being context-aware), and just make up libraries and APIs. Yet you can't be sure when it's bullshitting or not, making it an exercise in frustration for anything that's not trivial.Save your money and buy an o'reilly subscription.

isodevabout 1 year ago

I'm totally adding "rescue and recovery of projects botched by AI" to my list of services. One thing is certain, it's not going to be cheap.

Bjorkbatabout 1 year ago

I mean, this might just be existential cope, but my first thought when looking at the Upwork demo posted on Twitter (<a href="https://x.com/cognition_labs/status/1767548768734294113?s=20" rel="nofollow">https://x.com/cognition_labs/status/1767548768734294113?s=20</a>) was that it seemed a little bit suspicious.Namely, the client was asking for an unusually specific (for Upwork) ask. It was an almost perfect example of a job to be given to an AI agent for testing purposes.

评论 #39681673 未加载

评论 #39681979 未加载

评论 #39688893 未加载

评论 #39681535 未加载

crucialfelixabout 1 year ago

I have in my codebase several really long django views files (3k lines!). They were written in a poor fashion with many nested if statements for parsing and error handling.On a one by one basis I can use VSCode github copilot to rewrite each one the way I want it.What I want to do is iterate through all functions in the files and do each one of them.I know we are getting there, but does anybody know how that can be done right now?

评论 #39683877 未加载

评论 #39683853 未加载

dukeyukeyabout 1 year ago

Technological unemployment and doomerism aside, I think there's a big difference here - in the past, you've needed lots of capital to invest in those labour-saving devices. A farm labourer couldn't buy a tractor, a dockerworker can't buy a crane.But a software engineer absolutely can buy access to AI services.I have no idea how this will end up, but it'll be different to before.

m3kw9about 1 year ago

Until you can point out via video what is the issue(“see? Here it flickers a bit and here needs centering” or when you talk to the “swe agent” and say we need this feature taken out for now, and later you ask it to put the feature back in and it remembers it had code implemented at GitHub commit id xxyyzz, you really can’t call this a software engineer

meindnochabout 1 year ago

AI replacing one of the last well-paid jobs on the planet is a good thing. Large-scale societal changes are triggered when a critical number of haves turn into have-nots. I would recommend junior engineers to study Nechayev and Bakunin instead of the latest React flavor. Those will have a better ROI in the coming years.

adabaedabout 1 year ago

You are overreacting. The moment AI can completely replace SWE, our problem won't be having "jobs".

joeevans1000about 1 year ago

The lack of attention this development is getting on here is astounding. Other developers I am talking to about it brush it off, then change topic.So many comments about how insufficient the tool is.Our heads are really in the sand, I'm afraid.

pjmorrisabout 1 year ago

From the graph at the end: 13.8% of issues resolved.Devin may need some additional help for awhile.

syedmsawaidabout 1 year ago

Is it built with pre-existing LLMs or did they created one from the ground up? With 21 million Seed A funding, an LLM powerful than GPT4 seems impossible. What am I missing?

评论 #39683926 未加载

hackerlightabout 1 year ago

This is where inference speed starts to matter. H100 might be cheaper per inference than Groq but cutting down the wait time from 1 minute to 10 seconds could be a big deal.

评论 #39681761 未加载

erickmuneneabout 1 year ago

Wow, this is incredible news! Congratulations to the team behind Devin, the first AI engineer! This is a monumental leap forward in technology and innovation. I'm absolutely thrilled to see how Devin will revolutionize the field of engineering.As someone passionate about the potential of AI in tech, I can't wait to see what amazing feats Devin will accomplish. And who knows, maybe one day, companies like Munesoft Technologies will reach similar heights with their own AI-driven advancements. Here's to a future filled with endless possibilities! #DevinAI

评论 #39692362 未加载

zoominabout 1 year ago

An eval on 25% of the eval dataset is fishy? Why not 100%? Are they training with the eval set? Also that dataset is almost all python.

paraditeabout 1 year ago

For something that you can download and try right now, and actually works for daily coding tasks, you can try my desktop app 16x Prompt.<a href="https://prompt.16x.engineer/" rel="nofollow">https://prompt.16x.engineer/</a>It's not 100% automated but saves a lot of time spent on writing code.It works by composing prompts from tasks instructions, source code context and formatting instructions, resulting in high quality prompts that can be fed into LLMs to generate high quality code.

symlinkkabout 1 year ago

In the video he was having a chat conversation with Devin the whole time, it’s not like Devin did this completely on its own.

cdeutschabout 1 year ago

I was really hoping Devin was an actual human and this was a meme.

plinkplinkabout 1 year ago

There have been many tools that were sold as developer replacements over the years. - Microsoft FrontPage - Adobe Dreamweaver - A litany of glorified WYSIWYG editors - WordPress - Wix - Power Apps/SharePoint ...and so on.Business owners have been getting sexually aroused at the prospect of taking a developer's salary and putting it in their own pockets for decades. Each iteration of this wet dream has only locked businesses into "low code" systems that require even more, highly-specialized developers to operate. Right now, and probably for a while, Devon, et al. is on par with the drag-and-drop automagical app building snake oil stuff.LLMs are useful to help developers be more productive, which does translate to lay-offs, but until someone creates an AI that can translate the absolute fevered gibberish that comes out of business people's heads into a profitable piece of software, this is just MS FrontPage v100.0.Just like an entire industry sprang up around fixing WordPress websites that business owners thought they could do themselves, pretty soon we'll start seeing job postings for AI-Generated Spaghetti Unravellers.I'm (half seriously) imagining a future were software engineers are mostly consultants that show up and talk with business folks, then talk with the local robot, and get the project to actually work. Bill $1k per hour.

heldridaabout 1 year ago

Good luck finding someone to maintain your fancy AI generated a$$ looking app.

ellis0nabout 1 year ago

I wonder how Davin will deal with issues that have remained unfixed for decades

devinthenaiabout 1 year ago

There's also the alternative: Devin, the NAI.<a href="https://docs.google.com/document/d/1byJgu1G_M58QVWmpZeEDthyAB8Bq752RgL_gF7hEJQc/edit" rel="nofollow">https://docs.google.com/document/d/1byJgu1G_M58QVWmpZeEDthyA...</a>

andythedevabout 1 year ago

are we not concerned that even though, yes, devin is only solving 13% of issues - it is also an ML model. it is going to learn, potentially very quickly.

shreshth398495about 1 year ago

how will devin pass the CAPTCHA test when it encounters an error while coding? most websites block such automated tools? isn't it?

__lbracket__about 1 year ago

I love the collective pant shitting in this thread.

datavirtueabout 1 year ago

This awesome. Until Devin steals your startup idea.

评论 #39681716 未加载

asasasa123about 1 year ago

Write a demo of the Milvus vector database

globular-toastabout 1 year ago

I guess one good thing is proprietary software is dead. When are we getting the 100% compatible free version of Windows?

cxmccabout 1 year ago

time to start writing some cryptic code that AI won't be able to understand

cvhashim04about 1 year ago

Well, pack it in. It was a great run boys. Onto better things.

emawaabout 1 year ago

Devin can make a app with end points an join fronsidento backside

xzfyes2about 1 year ago

中国的酒吧有多少个

cp9about 1 year ago

sorry, but no automated bullshit machine is going to do my job.

gnarcoregrizzabout 1 year ago

Yet again, bad time to be on the labor side of the equation, great time to be a capitalist. For us laborers, if I had to choose from a list of fields to go into, anything creative would be low on the list. 'Prompt Engineer' will be the only one left.UBI is a pipe dream... it's not happening. The wealth and means of production won't be shared in any meaningful capacity. Wealth inequality can get a whole lot worse.

评论 #39684038 未加载

MSFT_Edgingabout 1 year ago

Humans seek work that provides satisfaction and meaning in their life.For every technological advancement, artisans are the first to be made obsolete.Sure we have landfills full of unworn textiles, the market says its good, but overall, we keep destroying what allows humans to seek meaning.Our governments and society have made it clear, if you don't produce value, you don't deserve dignity.We have outsourced art to computers, so people who don't understand art can have it at their fingertips.Now we're outsourcing engineering so those who don't understand it can have it done for cheap.We hear stories of those who don't understand therapy suggesting AI can be a therapist, of those who don't understand medicine suggesting AI can replace a doctor.What will be left? Where will we be? Just stranded without dignity or purpose, left to rot when we no longer produce value.I ask this question often, with multiple contexts, but to what end? Who benefits from these advancements? The CEO and shareholders, sure, but just because something can be found for cheaper, doesn't mean it improves lives. Our clothes barely last a year, our shoes fall apart. Our devices come with pre-destined expiration dates.Where will we be in the future? Those born into money can continue passing it around, a cargo cult for the numbers going up. But what about everyone else?

评论 #39683002 未加载

评论 #39689811 未加载

评论 #39686652 未加载

评论 #39683356 未加载

评论 #39687850 未加载

评论 #39709086 未加载

评论 #39686019 未加载

评论 #39683180 未加载

评论 #39685345 未加载

评论 #39682021 未加载

评论 #39705094 未加载

评论 #39684171 未加载

xystabout 1 year ago

Now I can farm out scut work to Devin lol.

rohandakuaabout 1 year ago

hey , I am a newbie in field of AI , i want to know about the nearby future of AI . after devin I am little ! or rather I should say deeply terrified about the future (5 to 8 years) of software eng. can someone explain ?

ij09j901023123about 1 year ago

Programmers will be worse than fast food at this point. Good luck future CS grads, you're gonna need it

ridruejoabout 1 year ago

Given the excitement on X right now about this, I don't understand how this is not in the front page already :)

评论 #39681086 未加载

评论 #39680855 未加载

评论 #39681906 未加载

评论 #39681293 未加载

preommrabout 1 year ago

We're not that far from a major turning point.Currently these models don't provide an adequate enough confidence measure that prevents them from maximizing their potential. In the next few years we're going to reach a point where models will be able to tell if something is possible and avoid hallucinating, guaranteeing much better correctness. Something like that would be absolutely killer.If you add on a top-down approach using a framework, such that it can architect a system down into small individual components, then that's a recipe for a really great workflow. The models we have now really shine in doing automated unit tests, and small bits of code to avoid limits with context size. Making the interfaces obvious enough, and being able to glue things together using obvious connections seems very possible.I really do think that in the next few years we're going to see one of these tools really do well.

评论 #39759029 未加载

79 comments

steve_adams_86about 1 year ago

评论 #39681623 未加载

评论 #39683229 未加载

评论 #39681648 未加载

nsypterasabout 1 year ago

评论 #39683480 未加载

ThalesXabout 1 year ago

评论 #39684532 未加载

评论 #39682382 未加载

评论 #39683807 未加载

评论 #39683627 未加载

评论 #39682918 未加载

评论 #39685104 未加载

评论 #39685257 未加载

评论 #39683025 未加载

评论 #39683441 未加载

评论 #39684737 未加载

评论 #39685513 未加载

评论 #39689584 未加载

评论 #39683543 未加载

评论 #39685028 未加载

评论 #39683982 未加载

评论 #39683140 未加载

评论 #39684531 未加载

评论 #39683874 未加载

评论 #39683479 未加载

评论 #39684323 未加载

评论 #39685364 未加载

评论 #39684785 未加载

评论 #39683822 未加载

评论 #39687581 未加载

评论 #39687356 未加载

评论 #39682770 未加载

the_newestabout 1 year ago

YeGoblynQueenneabout 1 year ago

评论 #39685571 未加载

pushedxabout 1 year ago

评论 #39688754 未加载

评论 #39684572 未加载

PodgieTarabout 1 year ago

评论 #39686530 未加载

评论 #39686927 未加载

dakiolabout 1 year ago

评论 #39691421 未加载

评论 #39684586 未加载

评论 #39725872 未加载

评论 #39686615 未加载

评论 #39725866 未加载

mlsuabout 1 year ago

评论 #39687186 未加载

RyEgswuCsnabout 1 year ago

评论 #39685420 未加载

评论 #39684486 未加载

Orasabout 1 year ago

评论 #39684196 未加载

评论 #39684094 未加载

评论 #39684114 未加载

goat_whispererabout 1 year ago

评论 #39682900 未加载

评论 #39688867 未加载

pankajdohareyabout 1 year ago

So why are they hiring? <a href="https://jobs.ashbyhq.com/cognition">https://jobs.ashbyhq.com/cognition</a> cant they just use "Devin" ?

aster0idabout 1 year ago

评论 #39684546 未加载

devineganabout 1 year ago

Have I been replaced? AI coming for my job and now my name!

评论 #39685434 未加载

HarHarVeryFunnyabout 1 year ago

评论 #39688264 未加载

singularity2001about 1 year ago

评论 #39684241 未加载

StickyRibbsabout 1 year ago

mellosoulsabout 1 year ago

评论 #39683721 未加载

mattlondonabout 1 year ago

评论 #39681829 未加载

评论 #39681604 未加载

评论 #39682639 未加载

评论 #39681516 未加载

评论 #39681701 未加载

评论 #39681509 未加载

评论 #39681469 未加载

评论 #39681807 未加载

评论 #39682262 未加载

评论 #39684282 未加载

评论 #39681481 未加载

评论 #39681949 未加载

评论 #39682866 未加载

评论 #39681542 未加载

评论 #39683286 未加载

评论 #39681536 未加载

评论 #39681500 未加载

评论 #39682020 未加载

评论 #39681518 未加载

评论 #39683020 未加载

评论 #39682590 未加载

评论 #39681576 未加载

评论 #39683143 未加载

评论 #39681489 未加载

评论 #39681568 未加载

评论 #39684183 未加载

评论 #39681676 未加载

senkoabout 1 year ago

评论 #39682805 未加载

评论 #39685785 未加载

devinpraterabout 1 year ago

Hey, they named it after me!

评论 #39683504 未加载

lacooljabout 1 year ago

This is just nice packaging on top of current models. Very nicely done but still not a giant leap forward from what is already here

评论 #39682031 未加载

jasfiabout 1 year ago

评论 #39683066 未加载

matthewsinclairabout 1 year ago

pedalpeteabout 1 year ago

评论 #39685815 未加载

hiddencostabout 1 year ago

"first" lol.Making false, grandiose claims like that burns a lot of trust.Focus on execution and quality.

DrAgOn200233about 1 year ago

Havocabout 1 year ago

评论 #39697995 未加载

epolanskiabout 1 year ago

devinthenaiabout 1 year ago

ein0pabout 1 year ago

I know it’s a rigged demo because they pretend AI was able to figure out their broken CUDA situation. :-)

gerashabout 1 year ago

评论 #39685790 未加载

LZ_Khanabout 1 year ago

rafadcabout 1 year ago

Prepare for a lot of copycat companies. Hey Devin, copy this company's software.

评论 #39681660 未加载

评论 #39681767 未加载

评论 #39681581 未加载

评论 #39687238 未加载

bachittleabout 1 year ago

ramozabout 1 year ago

swaxabout 1 year ago

MichaelRazumabout 1 year ago

This is awesome to bootstrap some ideas. The question is can it work with (large) existing code bases or modify it's own code. Guess a good test would be, can it reproduce Devin;)

playmkrabout 1 year ago

- first AI developer - raised $21 million - uses google forms for the onboardingok

huimangabout 1 year ago

isodevabout 1 year ago

I'm totally adding "rescue and recovery of projects botched by AI" to my list of services. One thing is certain, it's not going to be cheap.

Bjorkbatabout 1 year ago

评论 #39681673 未加载

评论 #39681979 未加载

评论 #39688893 未加载

评论 #39681535 未加载

crucialfelixabout 1 year ago

评论 #39683877 未加载

评论 #39683853 未加载

dukeyukeyabout 1 year ago

m3kw9about 1 year ago

meindnochabout 1 year ago

adabaedabout 1 year ago

You are overreacting. The moment AI can completely replace SWE, our problem won't be having "jobs".

joeevans1000about 1 year ago

pjmorrisabout 1 year ago

From the graph at the end: 13.8% of issues resolved.Devin may need some additional help for awhile.

syedmsawaidabout 1 year ago

Is it built with pre-existing LLMs or did they created one from the ground up? With 21 million Seed A funding, an LLM powerful than GPT4 seems impossible. What am I missing?

评论 #39683926 未加载

hackerlightabout 1 year ago

This is where inference speed starts to matter. H100 might be cheaper per inference than Groq but cutting down the wait time from 1 minute to 10 seconds could be a big deal.

评论 #39681761 未加载

erickmuneneabout 1 year ago

评论 #39692362 未加载

zoominabout 1 year ago

An eval on 25% of the eval dataset is fishy? Why not 100%? Are they training with the eval set? Also that dataset is almost all python.

paraditeabout 1 year ago

symlinkkabout 1 year ago

In the video he was having a chat conversation with Devin the whole time, it’s not like Devin did this completely on its own.

cdeutschabout 1 year ago

I was really hoping Devin was an actual human and this was a meme.

plinkplinkabout 1 year ago

heldridaabout 1 year ago

Good luck finding someone to maintain your fancy AI generated a$$ looking app.

ellis0nabout 1 year ago

I wonder how Davin will deal with issues that have remained unfixed for decades

devinthenaiabout 1 year ago

andythedevabout 1 year ago

are we not concerned that even though, yes, devin is only solving 13% of issues - it is also an ML model. it is going to learn, potentially very quickly.

shreshth398495about 1 year ago

how will devin pass the CAPTCHA test when it encounters an error while coding? most websites block such automated tools? isn't it?

__lbracket__about 1 year ago

I love the collective pant shitting in this thread.

datavirtueabout 1 year ago

This awesome. Until Devin steals your startup idea.

评论 #39681716 未加载

asasasa123about 1 year ago

Write a demo of the Milvus vector database

globular-toastabout 1 year ago

I guess one good thing is proprietary software is dead. When are we getting the 100% compatible free version of Windows?

cxmccabout 1 year ago

time to start writing some cryptic code that AI won't be able to understand

cvhashim04about 1 year ago

Well, pack it in. It was a great run boys. Onto better things.

emawaabout 1 year ago

Devin can make a app with end points an join fronsidento backside

xzfyes2about 1 year ago

中国的酒吧有多少个

cp9about 1 year ago

sorry, but no automated bullshit machine is going to do my job.

gnarcoregrizzabout 1 year ago

评论 #39684038 未加载

MSFT_Edgingabout 1 year ago

评论 #39683002 未加载

评论 #39689811 未加载

评论 #39686652 未加载

评论 #39683356 未加载

评论 #39687850 未加载

评论 #39709086 未加载

评论 #39686019 未加载

评论 #39683180 未加载

评论 #39685345 未加载

评论 #39682021 未加载

评论 #39705094 未加载

评论 #39684171 未加载

xystabout 1 year ago

Now I can farm out scut work to Devin lol.

rohandakuaabout 1 year ago

ij09j901023123about 1 year ago

Programmers will be worse than fast food at this point. Good luck future CS grads, you're gonna need it

ridruejoabout 1 year ago

Given the excitement on X right now about this, I don't understand how this is not in the front page already :)