Ask HN: What are some actual use cases of AI Agents right now?

169 点作者 chenxi9649超过 1 年前

There are quite a few start ups/OSS working on making LLMs do things on your behalf and not just complete your words. These projects range from small atomic actions to web scrapers to more general ambitious assistants.That all makes sense to me and I think is the right direction to be headed. However, it's been a bit since the inception of some of these projects/cool demos but I haven't seen anyone who uses agents as a core/regular part of their workflow.I'm curious if you use these agents regularly or know someone that does. Or if you're working on one of these, I'd love to know what are some of the hidden challenges to making a useful product with agents? What's the main bottle neck?Any thoughts are welcome!

39 条评论

PheonixPharts超过 1 年前

> I'd love to know what are some of the hidden challenges to making a useful product with agents?One thing that is still confusing to me, is that we've been building products with machine learning pretty heavily for a decade now and somehow abandoned all that we have learned about the process now that we're building "AI".The biggest thing any ML practitioner realizes when they step out of a research setting is that for most tasks accuracy has to be very high for it be productizable.You can do handwritten digit recognition with 90% accuracy? Sounds pretty good, but if you need to turn that into recognizing a 12 digit account number you now have a 70% chance of getting at least one digit incorrect. This means a product worthy digit classifier needs to be much higher accuracy.Go look at some of the LLM benchmarks out there, even in these happy cases it's rare to see any LLM getting above 90%. Then consider you want to chain these calls together to create proper agent based workflows. Even with 90% accuracy in each task, chain 3 of these together and you're down to 0.9 x 0.9 x 0.9 = 0.73, 73% accuracy.This is by far this biggest obstacle towards seeing more useful products built with agents. There are cases where lower accuracy results are acceptable, but most people don't even consider this before embarking on their journey to build an AI product/agent.

评论 #39375372 未加载

评论 #39377463 未加载

评论 #39375819 未加载

评论 #39375090 未加载

评论 #39380414 未加载

评论 #39375226 未加载

评论 #39380101 未加载

评论 #39394814 未加载

评论 #39377953 未加载

评论 #39382328 未加载

评论 #39374996 未加载

评论 #39376858 未加载

评论 #39379091 未加载

alexawarrior3超过 1 年前

None of these I've seen actually works in practice. Having used LLMs for software development the past year or so, even the latest GPT-4/Gemini doesn't produce anything I can drop in and have it work. I've got to go back and forth with the LLM to get anything useful and even then have to substantially modify it. I really hope there are some big advancements soon and this doesn't just collapse into another AI winter, but I can easily see this happening.Some recent actual uses cases for me where an agent would NOT be able to help me although I really wish it would:1. An agent to automate generating web pages from design images - Given an image, produce the HTML and CSS. LLMs couldn't do this for my simple page from a web designer. Not even close, even mixing up vertical/horizontal flex arrangement. When I cropped the image to just a small section, it still couldn't do it. Tried a couple LLMs, none even came close. And these are pretty simple basic designs! I had to do it all manually.2. Story Generator Agent - Write a story from a given outline (for educational purposes). Even at a very detailed outline level, and with a large context window, kept forgetting key points, repetitive language, no plot development. I just have to write the story myself.3. Illustrator Agent - Image generation for above story. Images end up very "LLM" looking, often miss key elements in the story, but one thing is worst of all: no persistent characters. This is already a big problem with text, but an even bigger problems with images. Every image for the same story has a character who looks different, but I want them to be the same.4. Publisher Agent - Package things together above so I can get a complete package of illustrated stories on topics available on web/mobile for viewing, tracking progress, at varying levels.Just some examples of where LLMs are currently not moving the needle much if at all.

评论 #39375292 未加载

评论 #39377024 未加载

评论 #39378748 未加载

评论 #39394879 未加载

评论 #39377422 未加载

评论 #39378870 未加载

deathmonger5000超过 1 年前

I taught <a href="https://github.com/KillianLucas/open-interpreter">https://github.com/KillianLucas/open-interpreter</a> how to use <a href="https://github.com/ferrislucas/promptr">https://github.com/ferrislucas/promptr</a>Then I asked it to add a test suite to a rails side project. It created missing factories, corrected a broken test database configuration, and wrote tests for the classes and controllers that I asked it to.I didn't have to get involved with mundane details. I did have to intervene here and there, but not much. The tests aren't the best in the world, but IMO they're adding value by at least covering the happy path. They're not as good as an experienced person would write.I did spend a non-trivial amount of time fiddling with the prompts I used to teach OI about Promptr as well as the prompts I used to get it to successfully create the test suite.The total cost was around $11 using GPT4 turbo.I think in this case it was a fun experiment. I think in the future, this type of tooling will be ubiquitous.

评论 #39375578 未加载

评论 #39374865 未加载

hubraumhugo超过 1 年前

We're using AI agents for the orchestration of our fully automated web scrapers. But instead of trying to have one large general purpose agent that is hard to control and test, we use many smaller agents that basically just pick the right strategy for a specific sub-task in our workflows. In our case, an agent is a medium-sized LLM prompt that has a) context and b) a set of functions available to call.For example we use it for:- Website Loading: Automate proxy and browser selection to load sites effectively. Start with the cheapest and simplest way of extracting data, which is fetching the site without any JS or actual browser. If that doesn't work, the agent tries to load the site with a browser and a simple proxy, and so on.- Navigation: Detect navigation elements and handle actions like pagination or infinite scroll automatically.- Network Analysis: Identify desired data within network calls.- Validation: Hallucination checks and verification that the data is actually on the website and in the right format. (this is mostly traditional code though)- Data transformation: Clean and map the data into the desired format. Finetuned small and performant LLMs are great at this task with a high reliability.The main challenge:We quickly realized that doing this for a few data sources with low complexity is one thing, doing it for thousands of websites in a reliable, scalable, and cost-efficient way is a whole different beast.The integration of tightly constrained agents with traditional engineering methods effectively solved this issue for us.Edit: You can try out a simplified version of this in our playground: <a href="https://www.kadoa.com/add" rel="nofollow">https://www.kadoa.com/add</a>

评论 #39374032 未加载

评论 #39374144 未加载

评论 #39374449 未加载

cl42超过 1 年前

I'm working on research agents to help with economic, financial, and political research. These agents are open source (see: <a href="https://github.com/wgryc/emerging-trajectories">https://github.com/wgryc/emerging-trajectories</a>).The use cases are pretty straight forward and low risk:1. Run a Google web search.2. Query a news API.3. Write a document based on the above, while citing sources.Here's an example of something written yesterday, where I'm forecasting whether July 2024 will be the hottest on record: <a href="https://emergingtrajectories.com/a/forecast/74" rel="nofollow">https://emergingtrajectories.com/a/forecast/74</a>This is working well in that the writeups are great and there are some "aha" moments, like the agent finding and referencing the The National Snow and Ice Data Center (NSIDC)... Very cool! I wouldn't have thought of it.Then there's the part where the agent also tells me that the Oregon Department of Transportation has holidays during the summer, which doesn't matter at all.So, YMMV, as they say... But I am more productive with these agents. I wouldn't publish anything formally without confirming and reviewing the content, though.

评论 #39374316 未加载

评论 #39374374 未加载

dongecko超过 1 年前

The company I work for has tons of documentation and regulations for several areas. In some areas the documents are well over a thousand and for the ease of use of these documents we build RAG based chat bots. This is why I have been playing with RAG systems on the scale of "build completely from scratch" to "connect the services in Azure". The retrieval part of a RAG is vital for good/reliable answers and if you build it naive, the results are not overwhelming.You can improve on the retrieved documents in many ways, like - by better chunking,- better embedding,- embedding several rephrased versions of the query,- embedding a hypothetical answer to the prompt,- hybrid retrieval (vector similarity + keyword/tfidf/bm25 related search),- massively incorporating meta data,- introducing additional (or hierarchical) summaries of the documents,- returning not only the chunks but also adjacent text,- re-ranking the candidate documents,- fine tuning the LLM and much, much more.However, at the end of the day a RAG system usually still has a hard time answering questions that require an overview of your data. Example questions are:- "What are the key differences between the new and the old version of document X?"- "Which documents can I ask you questions about?"- "How do the regulations differ between case A and case B?"In these cases it is really helpful to incorporate LLMs to decide how to process the prompt. This can be something simple like query-routing, or rephrasing/enhancing the original prompt until something useful comes up. But it can also be agents that come up with sub-queries and a plan on how to combine the partial answers. You can also build a network of agents with different roles (like coordinator/planner, reviewer, retriever, ...) to come up with an answer.* edited the formatting

评论 #39375711 未加载

评论 #39375397 未加载

furyofantares超过 1 年前

Agents are possible basically because the input to the LLM and the output of the LLM are both text. The loop is trivially closed.But they're universally garbage because they require the LLM to do a lot of things that LLMs are completely incompetent at. It's just way too early to expect to be able to remove that work and have it be done by an LLM.The fact is LLMs are useful because they easily do some work that you're terrible at, and you easily do a lot of work that it's terrible at, and this makes the LLM a good tool because you+LLM is better than either part of that equation alone.It's natural to think of the things that come effortlessly to you as easy, and to not even notice you're doing any work. But that doesn't change the fact that the LLM is completely incompetent at many of these things. It's way too early to remove the human from the loop.

评论 #39375521 未加载

minimaxir超过 1 年前

That depends on your definition of "Agent": the term has been warped by AI hypesters from the original ReACT paper to the point of being meaningless because it sounds cool.The more notable common paradigm of Agent workflows that will persist even if there's an AI crash is retrieval-augmented generation (RAG), which at a high-level essentially is few-shot text generation based on prior existing examples. There will always be value in aligning LLM output to be much more expected, such as "generate text in the style of these examples" or "use these examples to answer the user's question."Startups that just market themselves as "chat with your data!", even though they are RAG based, are gimmicks though and won't survive because they have no moat.

thoughtlede超过 1 年前

Answering to your second part of the question about hidden challenges:If you are using AI agents to automate a workflow [1] execution, then the question to ask is where is non-determinism in the workflow. As in, where do humans scratch their head as opposed to rely on deterministic computations.It turns out, a lot of times, as humans, we scratch our head just once for a given kind of objectives to come with a plan. Once we devise a plan, we execute the same plan over and over again without much difficulty.This inherent pattern in how humans solve problems sort of diminishes the value of AI agents because even in the best case scenario the agents would only be solving a one-time, front-loaded pain. The value add would have been immense if the pain has been recurrent for a given objective.That is not to say there is no role for AI agents. We are trying to infuse AI agents into an environment where we as humans adapted pretty well. AI agents will have to create newer objectives and goals that we humans have not realized. Finding that uncharted territory, or blue ocean, is where the opportunity is.[1] By 'workflow' I mean a series of steps to take in order to achieve an overall objective.

评论 #39380586 未加载

janlukacs超过 1 年前

I keep asking the "experts" on Linkedin all the time, show me real life uses - radio silence.

评论 #39374027 未加载

评论 #39374232 未加载

评论 #39374894 未加载

lebean超过 1 年前

Don't downplay the value of watching agents talk to each other for amusement. I got a lot of mileage out of that and will continue to do so.

评论 #39374910 未加载

评论 #39374748 未加载

jonasnelle超过 1 年前

I think there are two main reason the fully "self-driving" end-to-end agents that demo well don't work.1. Planning is hard and exponential decay: Most demos try to start with a single sentence e.g. "order me a Dominos pizza" and go do the whole thing. Turns out planning has been one of the things that LLMs are not that good at. Also, even for a low probability p of failure at a given step, you'd get all steps rights with probability (1-p)^n which gets bad as n grows.2. Reliability matters and vision is not quite there yet: GPT4V is great, and there have been a handful of domain-specific open source models more focused on understanding screenshots but most of them are not good enough yet to work reliably. And for most applications, reliability is key if you are going to trust the agent to do things on your behalf.Disclaimer: I'm one of the founders of Autotab (<a href="https://www.autotab.com/">https://www.autotab.com/</a>), we're building a desktop app that lets anyone teach an AI to do a task just by showing it once. We've gone all in on reliability, building our own browser on top of Chromium to give us the bare metal control needed to deliver 98%+ reliability without any site-specific fine tuning.The other opinionated thing we've done is to focus on "Show, don't tell". We've found that for most important automations it is easier to show the agent the workflow than it would be to write a paragraph describing the steps. If you were to train a human, would you explain where to click or just share your screen & explain with a voice over?Some stories from our users: One works in IT and sometimes spends hours on- and off-boarding employees (60,000 people company), they need to do 20 different steps across 8 different software applications. Another example is a recruiting company that has many employees looking for candidates and sending messages on LinkedIn all day. In general we mostly see automations that take action or sync data across different software applications.

Liron超过 1 年前

There are countless use cases for a good AI agent.The problem is temporary: good AI agents don't exist, because sufficiently intelligent AI doesn't yet exist.(Agency and broad-domain intelligence are basically the same thing. Being able to answer questions relevant to planning is planning.)This state of affairs is in stark contrast to the crypto/Web3 space, where no one ever presented a use case even conditional on the existence of good blockchain technology.

评论 #39377370 未加载

RobotToaster超过 1 年前

There are now multiple ai models specifically to solve 4chan captchas, because AI is now better at solving captcha than humans.

a_wild_dandan超过 1 年前

A few personal uses:1. Find, annotate, aggregate, organize, summarize, etc all of my knowledge from notes2. A Google substitute with direct answers in place of SEO junktext and countless ads3. Writing boilerplate code, especially in unfamiliar languages4. Dynamic, general, richly nuanced multimodal content moderation without the human labor bill5. As an extremely effective personal tutor for learning nearly anythingI view AI as commoditizing general intelligence. You can supply it, like turning on the tap, wherever intelligence helps. I inject intelligence into moderating Discord message harassment, to detect when my 3D prints fail, to filter fluff from articles, clean up unstructured data, flag inappropriate images, etc. (All with the same model!) The world is overwhelmingly starved of intelligence. What extremely limited supply we have of this scarce resource (via humans) is woefully insufficient, and often extreme overkill where deployed. I now have access to a pennies-on-the-dollar supply of (low/mediocre quality) intelligence. Bet that I'll use it anywhere possible to unlock personal value and free up my intelligence for use where it's actually needed.

评论 #39374781 未加载

评论 #39376144 未加载

blueboo超过 1 年前

Joining the chorus of “applications exist but functional agents don’t”. There is one proven application: raising credulous VC money—and hoping that funding lasts until someone else’s foundation model makes it work

simonw超过 1 年前

Which definition of agents are you interested in?I'm pretty convinced at this point that the term "agents" is almost useless, because so many people are carrying entirely different mental models of what the term means - so it invites conversations where no-one is actually talking about the same exact idea.

评论 #39375741 未加载

usgroup超过 1 年前

Some of the comments reminded me of LeCun's claim regarding the error distribution of an LLM output conditional on content length. Namely, if "e" is the probability of an error, the probability of a sequence of length "n" being error free is p = (1-e)^n. That is to say there is exponentially less chance that an LLM sequence is "within the distribution of correct answers" as token length increases.This is a consequence of the "auto-regressive" model and its lack of in-built self-correction, and it is a limiting factor in actual applications.LeCun's tweet:<a href="https://twitter.com/ylecun/status/1640122342570336267" rel="nofollow">https://twitter.com/ylecun/status/1640122342570336267</a>

choeger超过 1 年前

I am not aware of anything that works today, but I think that there's room for shopping agents. Say you need a new USB Stick or a pair of shoes. Something between $10 and $1000 that you simply have to buy ASAP but doesn't warrant spending one or more evenings on research. A language model could sift through the descriptions and comments and try to eliminate trash and even outright fraud.But then again, it's just another search engine, essentially. So for how long would it stay useful before it accepts payments to promote certain offers?

评论 #39374115 未加载

评论 #39380553 未加载

jmull超过 1 年前

Some code completion bots are helpful to me but since you put this: "...and not just complete your words", I don't think I've seen anything.Well, except customer service bots (assuming the goal is to inexpensively absorb the energy of unhappy customers so they give up rather than actually getting the result they want or leaving, both of which cost the company money).

dmezzetti超过 1 年前

The fully autonomous agents that call tools work OK. I don't think any of them are ready for prime-time.I've had success in building multi-agent workflows. Which in a sense are an ensemble of experts that have different prompts to help bounce and validate answers off each other. For example, with one LLM prompt you can ask a question and another can validate the answer. A bit of strength in numbers defense against hallucinations.I wrote an example doing this in this article: <a href="https://medium.com/neuml/ai-powered-parenting-can-ai-help-you-communicate-with-your-grumpy-teen-4ff691fd7061" rel="nofollow">https://medium.com/neuml/ai-powered-parenting-can-ai-help-yo...</a>

sjhatfield超过 1 年前

I use Duet AI from Google in vscode. It is quite good at completing my code as I'm writing it. I almost exclusively write Python code. I am not promoting for a whole file or anything but it can often complete multiple lines at once

评论 #39374586 未加载

bediashpreet超过 1 年前

Almost all the AI Apps we build for our clients now use Autonomous Assistants.They're simply better than naive RAG, especially when you need to access APIs, format content or compare different sections of the knowledge base.Here are a few demos we have in the open:> HackerNews AI: Interacts with the hackernews API - <a href="https://hn.aidev.run" rel="nofollow">https://hn.aidev.run</a>> ArXiv AI: Reads, summarizes and compares arxiv papers - <a href="https://arxiv.aidev.run" rel="nofollow">https://arxiv.aidev.run</a>(love that it can give you a comparison between 2 papers)These use cases can only be possible using agents (or whatever that means)

Art9681超过 1 年前

It's a search engine in a box, a snapshot of a corner of the internet, or some archive, or information generated via other automated processes, compressed via clever algorithms. It is a highly useful tool the gets more useful the more you use it. A good LLM+Retrieval can save a lot of time. It's a tool that brings information to you. A single pane of very fragile glass today.I can honestly say that my use of search engines has decreased drastically and replaced with SOTA LLMs + Web retrieval.

burnte超过 1 年前

We're using Dragon's DAX Copilot with our providers. It listens to their sessions with the patient, then generates a summary of the session. It's amazingly good.

molave超过 1 年前

From a creative writing perspective, I can set personalities or quirks for a character and it can come up with in-character responses and dialogue.

GolfPopper超过 1 年前

Via Bing, Microsoft seems to be using AI agents to make me laugh. Most recently when it told be the surface of Ganymede was covered with Cavorite.

digitcatphd超过 1 年前

Right now in my opinion the most potential is the large action model designed by Rabbit or a similar general learning framework that can be rapidly configured without a ton of code. I anticipate such a tool or model and therefore will not invest significantly into building things the hard way. Already learned my lesson with that for LLMs.

PaulHoule超过 1 年前

My RSS reader is an A.I. agent, I have written a huge number of comments mentioning it<a href="https://hn.algolia.com/?dateRange=all&page=0&prefix=false&query=yoshinon&sort=byDate&type=comment" rel="nofollow">https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...</a>

评论 #39374564 未加载

vergessenmir超过 1 年前

Reasoning across many stages, converging on a user provided goal with the required level of accuracy is beyond commercially available LLMs. Take the travel agent use case, a recent paper showed that Llms tested would get dates and prices wrong. So the promise of AutoGPTs, GodGPTs etc is still quite far away

mise_en_place超过 1 年前

The only one I've found useful so far is a documentation agent, similar to what langchain has in their docs. It is useful to be able to interface with an agent, instead of having to scour the man-pages and find the relevant information.

评论 #39375488 未加载

geor9e超过 1 年前

A similar post, if you want to read the comments there <a href="https://news.ycombinator.com/item?id=39263664">https://news.ycombinator.com/item?id=39263664</a>

brendongeils超过 1 年前

majority of our users are seeing value from heavy co-pilot workflows in documents, jupyter notebooks and form generation. we built a data analytics platform for context. early use was chat with your SQL database and web research. now we are seeing more multi-modal uses for chart analysis. we have a whole list of tasks on our application homepage <a href="https://app.athenaintelligence.ai/" rel="nofollow">https://app.athenaintelligence.ai/</a>

NicoJuicy超过 1 年前

- Suggesting better variable names- Cleaning up / changing something in bulk ( eg. cleaning attributes from a class)- Generating unit tests ! ( just follow up on what it actually tests though)

rpmisms超过 1 年前

Google Pixel's Hold For Me feature. Not a typical LLM, but it's a phenomenal AI agent.

wepple超过 1 年前

Prioritization of work (security)Feed in a collection of docs about applications in use at an organization including their user guides; summarize what the capability of each application is; identify what capabilities are high risk; prioritize which applications need the most security visibilityUsually this is a classic difficult problem of inventory and 100 meetings.Perfect? Nope. A huge leap forward? Yes.

jdmccarty超过 1 年前

A big problem thus far has been singular agents trying to solve all aspects of the task, which others have noted can cause a 90% success rate to result in .9.9.9. I expect this spring and summer we will see the first batches of agents working together to solve problems. ChatGPT announced the ability for their paywalled GPTs to call upon other GPTs which is an elementary version of this process. As teams experiment with these concepts, and as compute costs fall in parallel, I believe we will see potentially thousands or millions of them working together. Doing so will bring a more deterministic outcome to the process while also encouraging the unexpected and variable output that is inherent in LLM output.

crowdyriver超过 1 年前

I am surprised no one is doing an llm code linter.

评论 #39376774 未加载

评论 #39376393 未加载

robertrocha884超过 1 年前

Nice read