First place I usually go is the terms of service and what they are granting themselves rights to. Not excited about how broad this is "3.2 License: By using the Services, you hereby grant to Cognition, its affiliates, successors, and assigns a non-exclusive, worldwide, royalty-free, fully paid, sublicensable, transferable license to reproduce, distribute, modify, and otherwise use, display, and perform all acts with respect to the Customer Data as may be necessary for Cognition to provide the Services to you."
No public testing, no benchmarks, no clear information on context window size or restrictions for extensive use, no comparison with the newest Claude Sonnet 3.5 or O1, nothing.<p>What we do get is a price of $ 500,- per month from a company that has been caught lying about this very product [0] and has never allowed independent testing.<p>Cognition, I am sorry to tell you, but there is no reason to trust you. In fact, there are multiple good reasons no to, even if you offered Devin at a fraction.<p>If this were e.g. Anthropic launching a new beyond Opus size model that was still performant and came with "chain-of-thought" capabilities, a far more extensive context window that still fully passes needle in haystack and is absolutely solid in sourcing from provided files, keeps on track even when provided with large documents, has few or no restrictions on usage and comes with extensive, verifiable benchmarks that showcase this offering being a significant upgrade over other models, maybe such a price could be justified.<p>You know why Cognition? Because they haven’t actively lied. What they did instead was let people use their models and actually test the advantages. Even Claude Instant way back when had certain use cases that made them have their own niche and showed they could execute before expanding with 2 and the larger context, then 3 with more applications. You never did any of that, you never gave anyone reason to believe what you claim, you didn’t even release benchmarks. See the difference?<p>Seems more like a simple cash grab, attempting to ride the O1 wave. OpenAI has a hard time justifying their Pro pricing, you doubling that makes this an out of season April fools joke. Waiting for the inevitable reporting that this is just another API wrapper for Claude or ChatGPT with our old faithful RAG.<p>[0] <a href="https://www.youtube.com/watch?v=tNmgmwEtoWE&pp=ygUJZGV2aW4gYWkg" rel="nofollow">https://www.youtube.com/watch?v=tNmgmwEtoWE&pp=ygUJZGV2aW4gY...</a>
From the second video: "We can focus on the things that excite us rather than just the maintenancing [maintenance] work".<p>But these are the kinds of problems that help shape the product. The software archictecture should be a compression of a deep and intuitive understanding of the problem space. How can you develop that knowledge if you're just delegating it to a black box that can't operate at a near-human level?<p>I've used ai based tools to great success, but on an ad-hoc basis, for specific and small functions or modules. To do the integration part requires an understanding of what abstraction is appropriate where. I don't think these tools are good that.
Mike from Vesta (first demo video) claims Devin saved "at least a hundred hours" debugging API integrations. That seems crazy to me - API integrations rarely take that long, and any engineer would spot issues like wrong API keys almost immediately. The tool might be more valuable for non-engineers creating initial drafts, but by the time you've written all the detailed specs for Devin, a mid-level engineer could have made significant progress on the task.
Looking for comprehensive benchmarks with Devin vs Cursor + Claude 3.6 vs ChatGPT o1 Pro.<p>In my own experience using Cursor with Claude 3.5 Sonnet (new) and o1-preview, Claude is sufficient for most things, but there are times when Claude gets stumped. Invariably that means I asked it to do too much. But sometimes, maybe 10-20% of the time, o1-preview is able to do what Claude couldn’t.<p>I haven’t signed up for o1 Pro because going from Cursor to copy/pasting from ChatGPT is a big DevX downgrade. But from what I’ve heard o1 Pro can solve harder coding problems that would stump Claude or o1-preview.<p>My solution is just to split the problem into smaller chunks that make it tractable for Claude. I assume this is what Devin’s doing. Or is Devin using custom models or an early version of the o1 (full or pro) API?
Should have come with a prominent warning at the app site that you're heading towards a $500 sub. I'm sure it's mentioned in places I didn't see it. Ideally, you would agree to the sub before you even create an account. This could save LOADS of signups from people who aren't your intended users.
I'm curious to see how this plays out when it comes to deploying and maintaining production-grade apps. I know relatively little about infrastructure and DevOps, but that's the stuff that actually always seems complicated when it goes from going to MVP to production. This question feels particularly important if we're expecting PMs and designers to be primary users.<p>That said, I'm super excited about this space and love seeing smart folks putting energy into this. Even if it's still a bit aspirational, I think the idea of cutting down time spent debugging and refactoring and putting more power in the hands of less technical folks is awesome.
It seems like a lot of the magic is providing LLMs with tools that let it work like a human would. This approach makes more sense to me then the model of expecting an LLM to just emit a giant block of code for a change, given a pile of RAG context.<p>( removed pricing q, as I missed it is $500 / month for whole teams. I get why that is the pricing, but doesn't work for me to try it in side projects sadly )
Am I the only one who laments this trend of using a common first name as a product name? When I see this, my first reaction is that the company lacks any empathy for people who have the name they're co-opting.<p><a href="https://www.washingtonpost.com/technology/interactive/2021/people-named-alexa-name-change-amazon/" rel="nofollow">https://www.washingtonpost.com/technology/interactive/2021/p...</a><p><a href="https://archive.is/w8r58" rel="nofollow">https://archive.is/w8r58</a>
> Small frontend bugs and edge cases - tag Devin in Slack threads<p>And other points where it should shine. How does it compare to using Cursor? Is it the slack integration?
How does Devin compare to lovable.dev ? I've been thoroughly impressed by their ability to build and host functioning apps from very basic prompts.