Debunking Devin: "First AI Software Engineer" Upwork Lie Exposed [video]

302 pointsby smukherjee19about 1 year ago

17 comments

mike_hearnabout 1 year ago

An extremely solid and convincing rebuttal. Sad. I wonder what the Devin team will say in response, if anything. Summarizing the video:• Devin is sold as being able to solve arbitrary Upwork tasks. In the video demo the problem it was asked to solve doesn't match the stated requirements of the customer (who asked for setup instructions, not code).• Devin is shown fixing errors in the source of a GitHub repo, but the files it's shown editing don't actually exist in that repo and some of the errors its fixing are nonsensical, of the type that'd never be made by a human. Inference: Devin must be fixing bugs in files it has itself created, but that's not clearly indicated.• There is no need to do any coding in the first place, because the README in the repository has all the instructions needed to achieve the task ready to go and they still work fine with only a one-line tweak, even though the repository is old. This is why the customer asked for instructions for how to run it on EC2 rather than for some coding. Devin didn't seem to read the README or understand that it only had to execute a couple of pre-existing Python scripts. The output in the video makes it look like the task was complex and sophisticated, with a long plan and many check boxes showing work completed, but the work was in fact pointless and redundant.• Devin's code changes are bad, e.g. writing its own low level file read loop instead of using the standard library properly.• Although the video makes it look like Devin did the task quickly, and the video creator was able to do the requested task in ~30 minutes, the timestamps in the chat show the task stretching over many hours and even into the next day.• Devin does nonsensical shell commands like `head -n 5 foo | tail -n 5`The strange mistakes lead to questions about what underlying model it's using. I don't think GPT-4 would make mistakes like that.The Internet of Bugs guy is an AI fan and uses coding AI himself, but points out that the company behind it says you can "watch Devin get paid for doing work" which isn't actually supported by their video evidence when watched carefully.

评论 #40050178 未加载

评论 #40013468 未加载

评论 #40016696 未加载

mewpmewp2about 1 year ago

I like hearing this balanced opinion. Generative AI is awesome, but demos around it should be honest and transparent. Looking at you Google as well. I don't know if I can trust a single Google demo for a while. They must do 0 edit, cut demos for quite some time in order to build that trust. Also this fake happy, go lucky, childlike communication after the edits/cuts is cringey. Unless you are doing it real-time like OpenAI did.Based on the slogans said around Devin I decided to ignore it completely - so while I couldn't say it's bs for sure, I did feel the slogans are embellished and too good to be true.Also for some reason I don't like the name for it at all. I don't understand how it could be so poorly chosen. Not that names can always give insight, but this name somehow was so off putting to me.

spacechild1about 1 year ago

From the youtube comment section:> I really hate how normalized faking it in demos has becomeI fully agree!

评论 #40026606 未加载

magospietatoabout 1 year ago

If you've ever attempted to extract a meaningful few-shot response to a non-trivial coding question from an LLM, this shouldn't be a surprise.That said, I have worked with actual humans in the industry who perform this badly, and that is still a significant achievement for a software program.

HEGallowayabout 1 year ago

I was quite skeptical of this. I've seen another company claiming to do the 3d generation do the same with their demos, they outsourced the "3d generation" part to low-wage workers in 3rd world countries and claimed the models to be generated by AI. I see this as a future trend to get investor money and do the rug pull.

nikincnabout 1 year ago

Attention is all you need and faking it in demo gets it .Even if you deliver a decent enough product it will sell now..

评论 #40021393 未加载

bluecrababout 1 year ago

It was all hype. There's barely been any A.I product that has been hyped and didn't turn out to be subpar a few weeks after.

lordsworkabout 1 year ago

Great video. As a counterpoint to the video author's claims, it's worth pointing out that Devin doesn't have to be anywhere close to as fast as a human software engineer to be useful. Even if it turns an hour task to a day long task, it's still going to cost a fraction of what it would to pay the engineer, so that bar it must meet is quite low.

评论 #40018797 未加载

评论 #40029803 未加载

评论 #40018130 未加载

评论 #40017657 未加载

评论 #40022099 未加载

langtang1996about 1 year ago

For complex systems composed of long chains of black-box units with randomness, if we evaluate their output using the three dimensions of “precision, stability, and size,” we should not have too high expectations.

Xavier_Labout 1 year ago

I summarized the video in details. <a href="https://gosummarize.com/youtube/@internetofbugs/debunking-devin-first-ai-software-engineer-upwork-lie-exposed" rel="nofollow">https://gosummarize.com/youtube/@internetofbugs/debunking-de...</a>

redgrangeabout 1 year ago

Anyone have any tips to getting access? I would love to test some things first hand to evaluate some of the claims.

peeyush81about 1 year ago

thats why open source is important, you cant fake it there. <a href="https://github.com/princeton-nlp/SWE-agent">https://github.com/princeton-nlp/SWE-agent</a>

hrpnkabout 1 year ago

The UI of Devin is quite nice. Anyone knows to what degree it's inspired by other tools on the market?

NEETPILLEDabout 1 year ago

AI bros, is it over? Did we go too far?

heyitakiabout 1 year ago

Picking apart Devin based solely on the demo video while ignoring all of the primary source testimonials on Twitter as to Devin's effectiveness seems somewhat intellectually dishonest... A demo video will of course cherrypick impressive-looking moments, even if they're not really.

评论 #40021162 未加载

评论 #40023553 未加载

评论 #40029826 未加载

评论 #40021263 未加载

评论 #40021418 未加载

nikincnabout 1 year ago

Curious to know top 3 products that ‘mukherjee’ finds good enough

nikincnabout 1 year ago

Faking in demo video is now an essential evil to get virality . Because social media is about virality . All commenting on the video and this thread are now curious to try Devin, to prove it works or to prove it doesnt .So now if it works faking helped it get virality , more users , more demand for product .If it doesnt work good enough still it will be good enough for some of the users who discovered it because of the virality .Only worst case is it is too hopelessly bad or doesnt work at all or tried to get to the moon and got nowhere . Hope the founders are smart enough to not be this bad

评论 #40028633 未加载

评论 #40025614 未加载

评论 #40021296 未加载

17 comments

mike_hearnabout 1 year ago

评论 #40050178 未加载

评论 #40013468 未加载

评论 #40016696 未加载

mewpmewp2about 1 year ago

spacechild1about 1 year ago

From the youtube comment section:> I really hate how normalized faking it in demos has becomeI fully agree!

评论 #40026606 未加载

magospietatoabout 1 year ago

HEGallowayabout 1 year ago

nikincnabout 1 year ago

Attention is all you need and faking it in demo gets it .Even if you deliver a decent enough product it will sell now..

评论 #40021393 未加载

bluecrababout 1 year ago

It was all hype. There's barely been any A.I product that has been hyped and didn't turn out to be subpar a few weeks after.

lordsworkabout 1 year ago

评论 #40018797 未加载

评论 #40029803 未加载

评论 #40018130 未加载

评论 #40017657 未加载

评论 #40022099 未加载

langtang1996about 1 year ago

Xavier_Labout 1 year ago

redgrangeabout 1 year ago

Anyone have any tips to getting access? I would love to test some things first hand to evaluate some of the claims.

peeyush81about 1 year ago

thats why open source is important, you cant fake it there. <a href="https://github.com/princeton-nlp/SWE-agent">https://github.com/princeton-nlp/SWE-agent</a>

hrpnkabout 1 year ago

The UI of Devin is quite nice. Anyone knows to what degree it's inspired by other tools on the market?