An extremely solid and convincing rebuttal. Sad. I wonder what the Devin team will say in response, if anything. Summarizing the video:<p>• Devin is sold as being able to solve arbitrary Upwork tasks. In the video demo the problem it was asked to solve doesn't match the stated requirements of the customer (who asked for setup instructions, not code).<p>• Devin is shown fixing errors in the source of a GitHub repo, but the files it's shown editing don't actually exist in that repo and some of the errors its fixing are nonsensical, of the type that'd never be made by a human. Inference: Devin must be fixing bugs in files it has itself created, but that's not clearly indicated.<p>• There is no need to do any coding in the first place, because the README in the repository has all the instructions needed to achieve the task ready to go and they still work fine with only a one-line tweak, even though the repository is old. This is why the customer asked for instructions for how to run it on EC2 rather than for some coding. Devin didn't seem to read the README or understand that it only had to execute a couple of pre-existing Python scripts. The output in the video makes it look like the task was complex and sophisticated, with a long plan and many check boxes showing work completed, but the work was in fact pointless and redundant.<p>• Devin's code changes are bad, e.g. writing its own low level file read loop instead of using the standard library properly.<p>• Although the video makes it look like Devin did the task quickly, and the video creator was able to do the requested task in ~30 minutes, the timestamps in the chat show the task stretching over many hours and even into the next day.<p>• Devin does nonsensical shell commands like `head -n 5 foo | tail -n 5`<p>The strange mistakes lead to questions about what underlying model it's using. I don't think GPT-4 would make mistakes like that.<p>The Internet of Bugs guy is an AI fan and uses coding AI himself, but points out that the company behind it says you can "watch Devin get paid for doing work" which isn't actually supported by their video evidence when watched carefully.
I like hearing this balanced opinion. Generative AI is awesome, but demos around it should be honest and transparent. Looking at you Google as well. I don't know if I can trust a single Google demo for a while. They must do 0 edit, cut demos for quite some time in order to build that trust. Also this fake happy, go lucky, childlike communication after the edits/cuts is cringey. Unless you are doing it real-time like OpenAI did.<p>Based on the slogans said around Devin I decided to ignore it completely - so while I couldn't say it's bs for sure, I did feel the slogans are embellished and too good to be true.<p>Also for some reason I don't like the name for it at all. I don't understand how it could be so poorly chosen. Not that names can always give insight, but this name somehow was so off putting to me.
If you've ever attempted to extract a meaningful few-shot response to a non-trivial coding question from an LLM, this shouldn't be a surprise.<p>That said, I have worked with actual humans in the industry who perform this badly, and that is still a significant achievement for a software program.
I was quite skeptical of this. I've seen another company claiming to do the 3d generation do the same with their demos, they outsourced the "3d generation" part to low-wage workers in 3rd world countries and claimed the models to be generated by AI. I see this as a future trend to get investor money and do the rug pull.
Great video. As a counterpoint to the video author's claims, it's worth pointing out that Devin doesn't have to be anywhere close to as fast as a human software engineer to be useful. Even if it turns an hour task to a day long task, it's still going to cost a fraction of what it would to pay the engineer, so that bar it must meet is quite low.
For complex systems composed of long chains of black-box units with randomness, if we evaluate their output using the three dimensions of “precision, stability, and size,” we should not have too high expectations.
I summarized the video in details. <a href="https://gosummarize.com/youtube/@internetofbugs/debunking-devin-first-ai-software-engineer-upwork-lie-exposed" rel="nofollow">https://gosummarize.com/youtube/@internetofbugs/debunking-de...</a>
thats why open source is important, you cant fake it there.
<a href="https://github.com/princeton-nlp/SWE-agent">https://github.com/princeton-nlp/SWE-agent</a>
Picking apart Devin based solely on the demo video while ignoring all of the primary source testimonials on Twitter as to Devin's effectiveness seems somewhat intellectually dishonest... A demo video will of course cherrypick impressive-looking moments, even if they're not really.
Faking in demo video is now an essential evil to get virality . Because social media is about virality . All commenting on the video and this thread are now curious to try Devin, to prove it works or to prove it doesnt .<p>So now if it works faking helped it get virality , more users , more demand for product .<p>If it doesnt work good enough still it will be good enough for some of the users who discovered it because of the virality .<p>Only worst case is it is too hopelessly bad or doesnt work at all or tried to get to the moon and got nowhere . Hope the founders are smart enough to not be this bad