TechEcho

6 comments

Many of the examples given for agents such as this are things I just flat wouldn’t trust an LLM to do - buying something on Amazon for example: Will it pick new or ‘renewed’? Will it select an item that is from a janky looking vendor and may be counterfeit? Will it pick the cheapest option for me? What if multiple colors are offered?This one example alone has so many branches that would require knowing what’s in my head.On the flip side, it’s a ridiculously simple task for a human to do for themselves, so what am I truly saving?Call me when I can ask it to check the professional reviews of X category on N websites (plus YouTube), summarize them for me, and find the cheapest source for the top 2 options in the category that will arrive in Y days or sooner.That would be useful.

评论 #42742518 未加载

评论 #42746109 未加载

评论 #42742436 未加载

评论 #42751171 未加载

评论 #42744388 未加载

mkagenius4 months ago

Pre-planned steps by Planner will go wrong more often than not, as it will try to guess the UI layers from its memory/training data. Its better to just ask the "next step" by giving it current state of the UI.I have built a similar project for mobile automation [1] and the validator phase is not separate rather it's inherently baked in each step since we only ask next step based on current UI and previous actions.My Planner sometimes goes "Oh, we are still on home screen, let's find the Uber app icon". This sort of self-correcting behaviour was not programmed but the LLM does it on its own.1. <a href="https://github.com/BandarLabs/ClickClickClick">https://github.com/BandarLabs/ClickClickClick</a> - A framework to automate mobile use via any LLM (local/remote)

lyime4 months ago

This is an impressive tool. I especially like the observability around the workflow and the steps it takes to achieve the outcome. We are potentially interested in exploring this if we can get the cost down at scale.

评论 #42743230 未加载

wejick4 months ago

UI is most common interface but not particularly AI friendly, i'll wait for more standardized interface that's both human and AI friendly. Hoping it will still br a browser based.

评论 #42745592 未加载

skull88888884 months ago

isn't browser use sota on web voyager? At this point web voyager seems to be outdated, there's def a need for a new harder benchmark.

评论 #42744614 未加载

评论 #42744200 未加载

govindsb4 months ago

congrats Suchintan! huge achievement!

6 comments

happyopossum4 months ago

评论 #42742518 未加载

评论 #42746109 未加载

评论 #42742436 未加载

评论 #42751171 未加载

评论 #42744388 未加载

mkagenius4 months ago

lyime4 months ago

评论 #42743230 未加载

wejick4 months ago

UI is most common interface but not particularly AI friendly, i'll wait for more standardized interface that's both human and AI friendly. Hoping it will still br a browser based.

评论 #42745592 未加载

skull88888884 months ago

isn't browser use sota on web voyager? At this point web voyager seems to be outdated, there's def a need for a new harder benchmark.

评论 #42744614 未加载

评论 #42744200 未加载

govindsb4 months ago

congrats Suchintan! huge achievement!

Skyvern Browser Agent 2.0: How We Reached State of the Art in Evals

6 comments

Skyvern Browser Agent 2.0: How We Reached State of the Art in Evals

6 comments