I wrote a SWE bench solver. The SWE bench issues are on mature projects like Django.<p>The objective of my solver was to get good solutions using only RAG (no embeddings) and with minimal cost (low token count).<p>Three techniques, combined, yielded good results. The first was to take a TDD approach, first generating a test and then requiring the LLM to pass the test (without failing others). It can also trace the test execution to see exactly what code participates in the feature.<p>The second technique was to separate “planning” from “coding”. The planner is freed from implementation details, and can worry more about figuring out which files to change, following existing code conventions, not duplicating code, etc. In the coding phase, the LLM is working from a predefined plan, and has little freedom to deviate. It just needs to create a working, lint-free implementation.<p>The third technique was a gentle pressure on the solver to make small changes in a minimum number of files (ideally, one).<p>AI coding tools today generally don’t incorporate any of this. They don’t favor TDD, they don’t have a bias towards making minimal changes, and they don’t work from a pre-approved design.<p>Good human developers do these things, and this is a pretty wide gap between adept human coders and AI.