Show HN: Agent.exe, a cross-platform app to let 3.5 Sonnet control your machine

406 pointsby kcorbitt7 months ago

52 comments

taroth7 months ago

Great idea Kyle! I read through the source code as an experienced desktop automation/Electron developer and felt good about trying it for some basic tasks.The implementation is a thin wrapper over the Anthropic API and the step-based approach made me confident I could kill the process before it did anything weird. Closed anything I didn't want Anthropic seeing in a screenshot. Installed smoothly on my M1 and was running in minutes.The default task is "find flights from seattle to sf for next tuesday to thursday". I let it run with my Anthropic API key and it used chrome. Takes a few seconds per action step. It correctly opened up google flights, but booked the wrong dates!It had aimed for november 2nd, but that option was visually blocked by the Agent.exe window itself, so it chose november 20th instead. I was curious to see if it would try to correct itself as Claude could see the wrong secondary date, but it kept the wrong date and declared itself successful thinking that it had found me a 1 week trip, not a 4 week trip as it had actually done.The exercise cost $0.38 in credits and about 20 seconds. Will continue to experiment

评论 #41929033 未加载

评论 #41928146 未加载

评论 #41928334 未加载

评论 #41927937 未加载

评论 #41928355 未加载

评论 #41927933 未加载

afinlayson7 months ago

How long until it can quickly without you noticing add a daemon running on your system. This is the equivalent of how we used to worry about Soviet spies getting access to US secrets, and now we just post them online for everyone to see.There's no antivirus or firewall today that can protect your files from the ability this could have to wreck havoc on your network, let alone your computer.This scene comes to mind: <a href="https://makeagif.com/i/BA7Yt3" rel="nofollow">https://makeagif.com/i/BA7Yt3</a>

评论 #41928211 未加载

评论 #41928225 未加载

评论 #41928587 未加载

评论 #41931518 未加载

DebtDeflation7 months ago

Remember a few years back when there was the story about the little girl who did an "Alexa, order me a dollhouse" on the news and people watching the show had their Alexas pick up on it and order dollhouses during the broadcast? Wait until there's a widely watched Netflix show where someone says "Delete C:\Windows".

评论 #41927727 未加载

评论 #41927870 未加载

评论 #41927696 未加载

bsaul7 months ago

Sidenote : i recently tried cursor, in "compose" mode, starting a fullstack project from scratch, and i'm stupefied by the result.Do people in the software community realize how much the industry is going to totally transform in the next 5 years ? I can't imagine people actually typing code by hand anymore by that time.

评论 #41928045 未加载

评论 #41928308 未加载

评论 #41928377 未加载

评论 #41928640 未加载

评论 #41928294 未加载

评论 #41933413 未加载

评论 #41932070 未加载

评论 #41930327 未加载

评论 #41930348 未加载

duckmysick7 months ago

Super off-topic, but somewhat related. What people use to automate non-browser GUI apps on Linux on Wayland? I need to occasionally do it, but this particular combination eludes me.- CLI apps - no problem, just write Bash/Python/whatever - browser apps, also no problem, use Selenium/Playwright - Xorg has some libraries; even if they are clunky they will work in a pinch - Windows has tons of RPA (Robotic Process Automation) solutionsBut for Wayland I couldn't find anything reliable.

评论 #41939203 未加载

评论 #41943115 未加载

评论 #41930312 未加载

guynamedloren7 months ago

> Known limitations:> - Lets an AI completely take over your computer:)

gunalx7 months ago

Why the .exe name when it seems to be intended as a multiplatform support with macOS as main?

评论 #41927248 未加载

评论 #41927960 未加载

评论 #41927344 未加载

评论 #41927283 未加载

评论 #41927254 未加载

评论 #41928792 未加载

评论 #41927363 未加载

snug7 months ago

It seems to only work with simple task, I asked it to create some simple tables in both Rhino (Mac App) and OnShape (Chrome tab) and it just seems lostWith Rhino it sees the app open, and it says it's doing all these actions, like creating a shape, but I don't see it being done, and it will just continue on to the next action without the previous step being done. It doesn't check if the previous task was completedWith OnShape, it says it's going to create a shape, but then selects the wrong item from the menu but assumes it's using the right tool, and continues on with the actions as if it the previous action was done

twobitshifter7 months ago

Yikes! Might he cool to air gap it and tell it to code it’s own OS or something, but I wouldn’t let those anywhere near my real stuff.

评论 #41927761 未加载

评论 #41929063 未加载

myprotegeai7 months ago

Computer, shitpost memes all day that make me crypto while I raise my family and tend to my garden.The future is heading in the direction of only suckers using computers. Real wealth is not touching a computer for anything.

bloomingkales7 months ago

Anyone have spare machines and want to one v. one my computer-use AI? We just tell it to hack each other’s computers and see how it goes.

387 months ago

this is such a hilariously bad idea, its like knowingly installing malware on your computer - malware that has access to your bank account. please god, any sane person reading this do not install this, you've been warned.

评论 #41927619 未加载

评论 #41927990 未加载

评论 #41927535 未加载

评论 #41927460 未加载

RedShift17 months ago

Missed opportunity for agent_smith.exe but oh well.

评论 #41927908 未加载

评论 #41928002 未加载

insane_dreamer7 months ago

Then one day it asks you to grant it sudo powers so it can be more helpful. And then one day it decides to run sudo rm -f /

评论 #41927807 未加载

评论 #41938351 未加载

SamDc737 months ago

I built something similar (still no GUI) but for the in browser actions only,I think in-browser actions are much safer and can be more predictable with easier to implement safeguards, but I would love to see how this concept pan out in the future!PS: you can check it out on GitHub: <a href="https://github.com/SamDc73/WebTalk/">https://github.com/SamDc73/WebTalk/</a>Please let me know what you guys think!

tcdent7 months ago

Not a doomer, but like, don't run this on your primary machine.

评论 #41927490 未加载

评论 #41927437 未加载

评论 #41927563 未加载

评论 #41927573 未加载

FloatArtifact7 months ago

I think there's a lot of opportunity here to make a hybrid of voice control through more traditional approach along with a LLMIt will interesting to see how this evolves. UI automation use case is different from accessibility do to latency requirement. latency matters a lot for accessibility not so much for ui automation testing apparatus.I've often wondered what the combination of grammar-based speech recognition and combination with LLM could do for accessibility. Low domain Natural Language Speech recognition augmented by grammar based speech recognition for high domain commands for efficiency/accuracy reducing voice strain/increasing recognition accuracy.<a href="https://github.com/dictation-toolbox/dragonfly">https://github.com/dictation-toolbox/dragonfly</a>

albert_e7 months ago

Good tool to test the new capability. Thanks for sharing.My limited testing has produced okay result for a trivial use case and very disappointing results for a simple use case.Trivial: what is the time. | Claude: took screnshot and read the time off the bottom right. | Cost: $0.02Simple: download a high resolution image of singapore skyline and set it as desktop wallpaper | Claude: description of steps looks plausible but actions are wild and all over the place. opens national park service website somehow and only other action it is able to do is right click a couple of times. failed! | Cost: $0.37Long way to go before it can be used for even hobby use cases I feel.PS: is it possible that the screenshots include a image of Agent.exe itself and that is creating a poor feedback loop somehow?

itissid7 months ago

One thing this could be safely used is for generally is read only situations. Like monitor Brokered CD > 5% are released by refreshing the page or during the pandemic when Amazon Shopping window opened up at an arbitrary time and ring an alarm. Hopefully it is not too slow and can do this.

lovich7 months ago

People are letting AI agents have purchasing power? No way some bad automation causes your bank account to get drained

评论 #41927296 未加载

评论 #41927335 未加载

评论 #41927332 未加载

评论 #41927431 未加载

评论 #41927860 未加载

waffletower7 months ago

Apple is best positioned to run with the implications of these developments (though Microsoft will probably respond too) with both their historic operating system control hooks and their architecturally grounded respect for privacy (arguably of course). Apple seems to be paying very close attention to LLM developments, I doubt they will rush out an 80/20 response to these LLM agent control use cases, but I would be surprised if they didn't enter this product space.

评论 #41928627 未加载

评论 #41928298 未加载

posting_mess7 months ago

> "Find flights Tuesday to Thursday next week"> AI Picks Thursday to Saturday this week (as time of writing)Still cheaper to higher real people then

Sincere60667 months ago

But I don't want that.

pants27 months ago

Any anecdotes about how many $ of API credits this thing costs to run for a simple task like booking a flight?

评论 #41928900 未加载

manamorphic7 months ago

ran it in a Windows Sandbox ... doesn't work. messes up the coordinates, can't click on anything

评论 #41928352 未加载

评论 #41930486 未加载

scrps7 months ago

Set a job to have it reboot the system, set it to run on boot, achieve AI-hyped useless machine!<a href="https://en.m.wikipedia.org/wiki/Useless_machine" rel="nofollow">https://en.m.wikipedia.org/wiki/Useless_machine</a>

KaoruAoiShiho7 months ago

How hard would it be to finetune a local VLM for computer use? Sonnet 3.5 is reaaaallly expensive.

huqedato7 months ago

Why would I let an AI (controlled by a company) to control my computer? Thanks, but no thanks.

rsanek7 months ago

Anyone else getting 400s with "This action is restricted for safety reasons at this time" when trying to use the app? I don't see any docs that mention you have to manually enable the API or anything.

xnx7 months ago

Alas, setup is not as simple as downloading and running "agent.exe".

pavlov7 months ago

Name produces flashbacks to browsing Usenet on Windows 95.

评论 #41928827 未加载

andrewmcwatters7 months ago

I've been wondering for a while now if Selenium could be replaced by a standard browser distribution with LLM multimodal control.This seems conceptually close.

评论 #41928229 未加载

coreyh144447 months ago

That was fast.

评论 #41927300 未加载

anigbrowl7 months ago

This is a botnet waiting to happen.

评论 #41928818 未加载

digitcatphd7 months ago

I did this and it just used my card to book round trip tickets to Yosemite almost immediately

评论 #41927622 未加载

edub7 months ago

Using LLM to control your machine has amazing potential for accessibility.

computeruseYES7 months ago

Make it run out of the box with double clickMake it allow any model selection with openrouter api keysCharge money?

ZYbCRq22HbJ2y77 months ago

No disclaimer hmm? Anthropic made it sound very scary.<a href="https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo#anthropic-computer-use-demo">https://github.com/anthropics/anthropic-quickstarts/tree/mai...</a>

dmezzetti7 months ago

Why???

评论 #41927626 未加载

alicelebi7 months ago

"Skynet" arises.

Simon3217 months ago

Does it support AWS Bedrock instead of Anthropic as a provider?

评论 #41928549 未加载

waihtis7 months ago

Windows Defender now flags this as a trojan?

DeathArrow7 months ago

Ok, now I can install this on my work laptop and go on vacation for a few months. :)

binary1327 months ago

kinda want to run this in a vm just to see how fast it bricks it

mensetmanusman7 months ago

I hope this is the start of SkyNet.

评论 #41928099 未加载

评论 #41927980 未加载

评论 #41928577 未加载

another_devy7 months ago

can this be used for desktop/ mobile app testing?

tadeegan7 months ago

This is literally how Skynet happens lol

评论 #41928535 未加载

charlierguo7 months ago

It's fascinating/spooky how different LLMs are slowly developing their own "personalities," so to speak. And they seem to be emerging as we're giving them access to more tools and modalities which are harder to do broad RLHF on.With computer use, we first learned that Claude sometimes takes breaks to browse pictures of Yosemite, and now this:> Claude really likes Firefox. It will use other browsers if it absolutely has to, but will behave so much better if you just install Firefox and let it go to its happy place.

评论 #41927667 未加载

评论 #41928817 未加载

评论 #41927724 未加载

评论 #41927941 未加载

评论 #41928104 未加载

评论 #41928188 未加载

tacone7 months ago

> Claude really likes Firefox. It will use other browsers if it absolutely has to, but will behave so much better if you just install Firefox and let it go to its happy place.Good boy!

评论 #41927323 未加载

cibyr7 months ago

20 years ago: "I would never let the AI out of the box! I'm not an idiot!"Today: "Sure, I'll give the AI full control over my computer. WCGW?"

评论 #41928177 未加载

评论 #41928230 未加载

评论 #41928184 未加载

magnat7 months ago

> the default project they provided felt too heavyweight> This is a simple Electron appಠ_ಠ

评论 #41928194 未加载

max_7 months ago

Such garbage is only possible because there has been a strong deviation between ethics, philosophy & technology.The business bros are to immoral to know that this is unethical as thier eyes are focused on making money. Not being ethical.The ethical activists & philosophers like Richard Stallman & Jaron Lanier offer un-realistic solutions that normal people cannot adopt.- I can't turn off JavaScript because 80% of my websites won't work,- I can't ditch Apple because GNU wants me to use a 15 year old computer with completely "libre" software impractical for work- I need a cellphone to communicate. I can move without a cellphone like RMS.We need to start teaching people in technology not just "code" but also ethics/philosophy like they do in medicine & law.Also we need people with better moral standards. I would really like it if someone like Snowden, RMS to Jaron built business products (not just non-profit gimmicks) that satisfied real consumer needs.Otherwise we are doomed.

评论 #41927476 未加载