Here's the crux of why AI is not yet useful for more than simple projects: they cannot actually know when something is correct or wrong. As a result there is no guarantee that something is implemented properly. To clarify, a human junior engineer might not have knowledge but they KNOW they don't have the knowledge, AND they know when they have the right answer. A junior engineer can check their results and verify with near 100% certainty that something works or it doesn't.<p>With Cursor I keep running into suggestions that create bugs. Even a junior dev knows how to check their solution to see if it actually works or not. The article says to build a "stdlib" of things that go wrong so it stops, but I would think that will exceed the max tokens very quickly and make things exceedingly harder to troubleshoot. My guess is that once inference is practically free (in computation) and we can throw 1000 agents at a single application in order to get the proper level of agency such that it is obvious to check your answer and be as reliable as a human.