Hi HN!<p>For a couple of months, I've been thinking about how can GPT be utilized to generate fully working apps and I still haven't seen any project that I think has a good approach.<p>I just don't think that Smol developer or GPT engineer can create a fully working production-ready app from scratch without a developer being involved and without any debugging process.<p>So, I came up with an idea that I've outlined thoroughly in this blog post - <a href="https://blog.pythagora.ai/2023/08/23/430/" rel="nofollow noreferrer">https://blog.pythagora.ai/2023/08/23/430/</a> - but basically, I have 3 main "pillars" that I think a dev tool that generates apps needs to have:<p>1. Developer needs to be involved in the process of app creation - I think that we are still far away from an LLM that can just be hooked up to a CLI and work by itself to create any kind of an app by itself. Nevertheless, GPT-4 works amazingly well when writing code and it might be able to even write most of the codebase - but NOT all of it. That's why I think we need a tool that will write most of the code while the developer oversees what the AI is doing and gets involved when needed. When he/she changes the code, GPT Pilot needs to continue working with those changes (eg. adding an API key or fixing a bug when AI gets stuck).<p>2. The app needs to be coded step by step just like a human developer would. All other code generators just give you the entire codebase which I very hard to get into. I think that, if AI creates the app step by step, it will be able to debug it more easily and the developer who's overseeing it will be able to understand the code better and fix issues as they arise.<p>3. This tool needs to be scalable in a way that it should be able to create a small app the same way it should create a big, production-ready app. There should be mechanisms that enable AI to debug any issue and get requirements for new features so it can continue working on an already-developed app.<p>So, having these in mind, I created a PoC for a dev tool that can create any kind of app from scratch while the developer oversees what is being developed.<p>I call it GPT Pilot and it's open sourced here - <a href="https://github.com/Pythagora-io/gpt-pilot">https://github.com/Pythagora-io/gpt-pilot</a><p>Here are a couple of demo apps that GPT Pilot created:
Real time chat app - <a href="https://github.com/Pythagora-io/gpt-pilot-chat-app-demo">https://github.com/Pythagora-io/gpt-pilot-chat-app-demo</a>
Markdown editor - <a href="https://github.com/Pythagora-io/gpt-pilot-demo-markdown-editor.git">https://github.com/Pythagora-io/gpt-pilot-demo-markdown-edit...</a>
Timer app - <a href="https://github.com/Pythagora-io/gpt-pilot-timer-app-demo">https://github.com/Pythagora-io/gpt-pilot-timer-app-demo</a><p>Here is a diagram of how it works in general - <a href="https://bit.ly/3R0Gqot" rel="nofollow noreferrer">https://bit.ly/3R0Gqot</a>
And here's a diagram of how the coding part works - <a href="https://bit.ly/3P4naEa" rel="nofollow noreferrer">https://bit.ly/3P4naEa</a><p>Some concepts that GPT Pilot uses are:<p>Recursive conversations are conversations with the LLM that can be used “recursively”. Eg. if GPT Pilot detects an error, and during the debugging process, another error happens. Then, GPT Pilot needs to stop debugging the first issue, fix the second one, and then get back to fixing the first issue. It works by rewinding the context and explaining each error in the recursion separately.<p>Context rewinding is a relatively simple idea. For solving each development task, the context size of the first message to the LLM has to be relatively the same. Eg. the context size of the first LLM message while implementing task #5 has to be more or less the same as the first message while task #50. Because of this, the conversation needs to be rewound to the first message upon each task. When GPT Pilot creates code, it creates the pseudocode and a description for each file and folder that it creates. So, when we need to implement task #50, in a separate conversation, we show the LLM the current folder/file structure; it selects only the code that is relevant for the current task, and then, in the original conversation, we show only the selected code instead of the entire codebase.<p>I'm curious to hear what others think about this approach? Do you have any ideas yourself on how this kind of a tool should work?