TechEcho

I’ve always wanted to be able to control my computer hands-free. Whether it’s when my hands are tied like when I’m eating, or when I’m away and want to run a routine task.There are existing tools that can execute predefined tasks on the browser, but they have several disadvantages:* They work by matching css selectors, so they are brittle -> increases the cost of maintenance.* Every task must be predefined - the tool won’t work on new websites without additional configuration.* It is tedious to specify a task consisting of many individual steps.To solve this, I built [name redacted] - it’s a chrome extension that uses GPT4-V to interpret the screen contents and deduce the sequence of steps needed to complete a task. See a demo here: <a href="https://www.youtube.com/watch?v=pyy7cMj-zHk" rel="nofollow">https://www.youtube.com/watch?v=pyy7cMj-zHk</a>.Here’s how it works:1. Ask [name redacted] to perform a task, like book me a dinner reservation at my favorite restaurant.2. [name redacted] takes a screenshot of the current page and some additional metadata about the interactive elements, and selects the next action to take to accomplish the task.3. The action is performed in the browser tab, and the process repeats until the task is done.In this way, [name redacted] can operate on all websites, perform any task a human can, and work mostly autonomously. It is a work in progress to get the accuracy correct, but I’m confident the approach is valid.Some things I’m excited for in the roadmap:* Voice in/out support* Record and rerun macros* Automated testing and reports with [name redacted]Let me know what you think and I’d love any feedback you have!

Show HN: Automate the browser with an AI assistant

1 comment

Show HN: Automate the browser with an AI assistant

1 comment