Auto-GPT Unmasked: The Hype and Hard Truths of Its Production Pitfalls

90 点作者 artex_xh大约 2 年前

13 条评论

This is a very strange article. It's also very strange to read some of the github issues which are basically like "this doesn't work!" it's so clear from the code and the discussions that this was basically the equivalent of "random rocket to fame" for a toy project that is even described as a toy.Why should it even work? Cool, now there's thousands of people playing with the same code, glad I'm not getting the github issues or trying to manage the PRs, but lots of smart people that want to play around are playing with the same code and forking it and suggesting things and we have def not reached a point where we need to have a 3 month argument about formatting guidelines, so how can it be a bad thing that a bunch of people are playing with some code and a new tool?Did we really go from "oh my god deep mind can make things have weird eyeballs" to "WTF this hyper-realistic photograph I created with a paragraph has some shadows that aren't quite in the right place and autogpt gets stuck in loops and thousands of API calls hitting some of the most computationally expensive things humans have ever done cost some money and sometimes go into loops in this very mature codebase?"humans man....

评论 #35563331 未加载

simplify大约 2 年前

> As we celebrate Auto-GPT's rapid ascent, it's crucial to take a step back and scrutinize its potential shortcomings.Why is it crucial? This project has existed for barely a month, and describes itself as an experiment. Who is using this for production?

评论 #35563440 未加载

评论 #35563385 未加载

评论 #35563190 未加载

评论 #35563466 未加载

评论 #35563308 未加载

评论 #35563275 未加载

atleastoptimal大约 2 年前

The problem with most of these GPT-4 experiments is because of the buzz around AI it's much more advantageous for a company to build their userbase on hype rather than efficacy. Everyone has FOMO and wants to be part of the infinite passive income game so whoever promises that with the greatest apparent clarity will win out in the end. Fake it until you make it (and by make it I mean that GPT-5 is released and automatically eases all the bottlenecks via just being a lot smarter)

评论 #35563072 未加载

goatlover大约 2 年前

Nice counter to all the AGI/ASI claims and rampant speculation going on. I'm seeing videos and podcasts making all sorts of huge claims about what can be done right now or in the very near future.

评论 #35563741 未加载

whistle650大约 2 年前

It is a strange article, speaking as if AutoGPT were not completely nascent, which it is. So the critiques aren't even really wrong. The most valuable observation is that vector DBs are overkill (does LangChain have a stake in Pinecone?)

评论 #35564603 未加载

kmod大约 2 年前

It seems like the people who are massively over-hyping AutoGPT have never used it. It's fascinating how the entire hype cycle can be based on an idea of what's been done as opposed to what's actually been done.As to the tool itself, I played around with it and it has some cool ideas that I think are valuable, even if it's not AGI

andai大约 2 年前

In my (few hours of) testing, Auto-GPT was quite unreliable. If you'll pardon the expression, it suffers from severe ADHD: procrastinates, overthinks, gets distracted.I think this is due to the main loop being a GPT feedback loop. Each loop it has a small chance of something going wrong (or a large chance, depending on the query), so as it loops repeatedly, the chance of failure approaches 100%.My idea was to replace the core loop, instead of a GPT feedback loop just make it a few lines of Python.Now the thing actually does what it says it's going to do, "thinking lag" is eliminated, and API usage is reduced 80%.I turned (parts of) Auto-GPT into a tiny Python library, specialized for internet research.GPT-3 and GPT-4 are able to use this library to write Python programs that do useful work.This way they can "crystallize" their plans in code, to ensure that they will run.Here is the interface:<pre><code> def search(query, max_results=8) -> List[Dict]: pass # uses duckduckgo def load(url) -> str | None: pass # uses requests and beautifulsoup def summarize(text, task) -> str | None: pass # uses gpt-3 def save(filename, text) -> bool: pass </code></pre> See the comments below the gist for the GPT-3 and 4 versions of main.py (20-30 lines for an internet research agent!).<a href="https://gist.github.com/avelican/2d4e718954593e3df9e0e5ee6751f470" rel="nofollow">https://gist.github.com/avelican/2d4e718954593e3df9e0e5ee675...</a>Note: It's currently optimized for my main use-case, which is internet research. So it's not an Auto-GPT in any sense. But it does one thing, and does it fairly well.P.S. The Holy Grail would be a system where the user enters a query, and the system translates it into Python on top of Auto-GPT's library, and runs that. I haven't tried that yet, and I'm a little afraid to...

评论 #35568870 未加载

devinprater大约 2 年前

Yeah, tried to have it describe an image that it downloaded from a Dropbox link, and it spat out junk. Ah well. Guess the GPT4 in ChatGPT isn't the multi-modal model.

评论 #35563923 未加载

vishnudeva大约 2 年前

This brings some much needed balance to the unchecked hype of Auto-GPT. Because most people don't have access to GPT-4; and from Twitter, it seems to most like these agents are already "ready", which is definitely not the case.All the breathless evangelizing out there is for incredibly simple problems that break down the second we try something complex. Yes a GPT-4 agent can go far in the coming months, but there's a ceiling to how good it can be because there's a ceiling to how well the model can reason. Newer LLMs will be the ultimate answer.

joshka大约 2 年前

This is just a state of play issue. It's not good enough YET.On cost: 3.5 is 1/15th the cost, and a bunch faster though supports less context, so it's worth experimenting with which parts of tasks need GPT-4 and which need 3.5 (perhaps a feature where GPT-4 manages the main tasking / verification and 3.5 handles the individual small tasks - even if GPT has to be called 15 times to get the same result, this approach wins).On functionality: yeah, this is just some person's plaything. Reading the code it doesn't seem particularly well production grade (like many a python passion program). Something built similarly with a decent architecture (multi-agent + using an approach that automatically optimizes tasks) I can see this getting cheaper and better easily.Just like in software dev, where we write a spec, write a plan, write acceptance criteria, write code, write tests, run tests, iterate, Auto-GPT type software needs a similar framework to work within that is not just defined by the code, but by generalizing that to an architecture. <a href="https://github.com/daveshap/raven">https://github.com/daveshap/raven</a> is an interesting project exploring some of this.

anonzzzies大约 2 年前

Current gpt is knowledgable, however not smart enough to work without the humans. It can come to great results fast when aided by humans. So adding microwork (mTurk or others) APIs to its arsenal could make this vastly better and resolve the pitfalls until the AI gets better. If it ever does.It saves tons and tons of work though; now that we integrated it into our pipeline after months of tweaking, 3.5 is really starting to remove the need for many of my colleagues. Some of us are needed to say ‘yes or no or redo’ as it were, but everything is far more efficient and much (100x $ or more less per day) cheaper.

评论 #35725266 未加载

KyeRussell大约 2 年前

Hugely disappointing to see so many people take this argument at face value instead of seeing it for what it is: silly sensationalist content written in bad faith.

winddude大约 2 年前

"This is like the robot's ability to learn from its mistakes. Auto-GPT can review its work, build on its previous efforts, and use its history to produce more accurate results."Can it? I've played with it a bit and haven't seen that. If someone has some excellent examples, I would love to see.