In my (few hours of) testing, Auto-GPT was quite unreliable. If you'll pardon the expression, it suffers from severe ADHD: procrastinates, overthinks, gets distracted.<p>I think this is due to the main loop being a GPT feedback loop. Each loop it has a small chance of something going wrong (or a large chance, depending on the query), so as it loops repeatedly, the chance of failure approaches 100%.<p>My idea was to replace the core loop, instead of a GPT feedback loop just make it a few lines of Python.<p>Now the thing actually does what it says it's going to do, "thinking lag" is eliminated, and API usage is reduced 80%.<p>I turned (parts of) Auto-GPT into a tiny Python library, specialized for internet research.<p>GPT-3 and GPT-4 are able to use this library to write Python programs that do useful work.<p>This way they can "crystallize" their plans in code, to ensure that they will run.<p>Here is the interface:<p><pre><code> def search(query, max_results=8) -> List[Dict]: pass # uses duckduckgo
def load(url) -> str | None: pass # uses requests and beautifulsoup
def summarize(text, task) -> str | None: pass # uses gpt-3
def save(filename, text) -> bool: pass
</code></pre>
See the comments below the gist for the GPT-3 and 4 versions of main.py (20-30 lines for an internet research agent!).<p><a href="https://gist.github.com/avelican/2d4e718954593e3df9e0e5ee6751f470" rel="nofollow">https://gist.github.com/avelican/2d4e718954593e3df9e0e5ee675...</a><p>Note: It's currently optimized for my main use-case, which is internet research. So it's not an Auto-GPT in any sense. But it does one thing, and does it fairly well.<p>P.S. The Holy Grail would be a system where the user enters a query, and the system translates it into Python on top of Auto-GPT's library, and runs that. I haven't tried that yet, and I'm a little afraid to...