Show HN: Autotab – Programmable AI browser for turning web tasks into APIs

159 pointsby jonasnelle6 months ago

Hey HN, we're Alexi and Jonas the co-founders of Autotab (<a href="https://autotab.com">https://autotab.com</a>). Autotab is a chrome-based browser you can teach to do complex tasks, with a simple API for running them from your app or backend.Here is a walkthrough of how it works: <a href="https://youtu.be/63co74JHy1k" rel="nofollow">https://youtu.be/63co74JHy1k</a>, and you can try it for free at <a href="https://autotab.com">https://autotab.com</a> by downloading the app.Why a dedicated editor?The number one blocker we've found in building more flexible, agentic automations is performance quality BY FAR (<a href="https://www.langchain.com/stateofaiagents#barriers-and-challenges" rel="nofollow">https://www.langchain.com/stateofaiagents#barriers-and-chall...</a>). For all the talk of cost, latency, and safety, the fact is most people are still just struggling to get agents to work. The keys to solving reliability are better models, yes, but also intent specification. Even humans don't zero-shot these tasks from a prompt. They need to be shown how to perform them, and then refined with question-asking + feedback over time. It is also quite difficult to formulate complete requirements on the spot from memory.The editor makes it easy to build the specification up as you step through your workflow, while generating successful task trajectories for the model. This is the only way we've been able to get the reliability we need for production use cases.But why build a browser?Autotab started as a Chrome extension (with a Show HN post! <a href="https://news.ycombinator.com/item?id=37943931">https://news.ycombinator.com/item?id=37943931</a>). As we iterated with users, we realized that we needed to focus on creating the control surface for intent specification, and that being stuck in a chrome sidepanel wasn't going to work. We also knew that we needed a level of control for the model that we couldn't get without owning the browser. In Autotab, the browser becomes a canvas on which the user and the model are taking turns showing and explaining the task.Key features:1. Self-healing automations that don't break when sites change2. Dedicated authoring tool that builds memory for the model while defining steps for the automation3. Control flows and deep configurability to keep automations on track, even when navigating complex reasoning tasks4. Works with any website (no site-specific APIs needed)5. Runs securely in the cloud or locally6. Simple REST API + client libraries for Python, NodeWe'd love to get any early feedback from the HN community, ideas for where you'd like the product to go, or experiences in this space. We will be in the comments for the next few hours to respond!

27 comments

slfnflctd6 months ago

If I understand this correctly, it looks like the promise I saw in that 'Record Macro' button in my Excel toolbar in the 1990s might finally be coming to fruition in a wider and more capable sense! A pleasant surprise effect of the new AI situation if true.I noticed in another comment that you said some steps can be made 'optional' (e.g. clicking through a modal). In my ancient Excel macro adventure, what I learned was that I had to tweak the heck out of the VBA code that Record button generated, which led to me just straight writing VBA for everything and eventually abandoning the Record feature entirely. I had a similar experience later on with AutoHotKey. What are the analogous aspects of Autotab to this? Also, to what extent is hand-manipulating the underlying automation possible and/or necessary to get optimal results?

评论 #42203371 未加载

pugio6 months ago

I love the idea - owning the browser definitely seems like the right approach.I tried it out on a workflow I've been manually piecing together and it gave me a bunch of "Error encountered, contact support" messages when doing things like clicking on a form input field, or even a button.The more complex "Instruction" block worked correctly instead (literally things like "click the "Sign In" button), but then I ran out of the 5 minutes of free run time when trying to go through the full flow. I expect this kind of thing will be fixed soon, as it grows.In terms of ultimate utility, what I really want is something which can export scripts that run entirely locally, but falling back to the more dynamic AI enhanced version when an error is encountered. I would want AutoTab to generate the workflow which I could then run on my own hardware in bulk.Anyway, great work! This is definitely the best implementation I've seen of that glimpsed future of capable AI web browsing agents.

评论 #42312290 未加载

评论 #42199573 未加载

rava-dosa6 months ago

Really exciting to see this approach to automation and intent specification! We’ve been working with similar challenges at Origins AI, where we focus on deep tech solutions.I can’t overstate how much having a robust system for breaking down tasks and iterating on them has helped us.For one of our recent projects, we had to integrate complex workflows with third-party systems, and it was clear that reliability came down to how well we could define and refine intent over time.I’m especially curious about your self-healing automations. That’s an area where we’ve found a lot of value using models that can adapt to subtle UI changes, but it’s always a tradeoff with latency. Would love to hear more about how you balance that in production!Looking forward to trying Autotab and seeing how it compares with some of the internal tools we’ve built!

评论 #42203391 未加载

MattDaEskimo6 months ago

Very neat in theory but I'm failing to find any technical details.Which layer is the automation happening? Inside using Dev tools? Multiple?What is the self-healing mechanic? I'm guessing invoking an LLM to find what happened and fix it?I guess what I'm wondering is. Is this some sort of hybrid between computer use and Dev tools usage?

评论 #42198523 未加载

Carrok6 months ago

You say "try it for free" but your website has no pricing information at all. Is this free for just a while? Free forever? What is your monetization strategy?Can I point it at my own LLM or am I locked into using OpenAI?

评论 #42198492 未加载

评论 #42198572 未加载

adamkhakhar6 months ago

This is awesome! What is your most common use case? Have you thought of competing with <a href="https://scribehow.com/" rel="nofollow">https://scribehow.com/</a> in the documentation space?

评论 #42200907 未加载

alex_c6 months ago

The functionality looks very very cool. But the privacy policy raises an eyebrow - am I overreacting?Usage Information. To help us understand how you use our Services and to help us improve them, we automatically receive information about your interactions with our Services, like the pages or other content you view, the searches you conduct, and the dates and times of your visits.Desktop Activity on our Services. In order to provide the Services, we need to collect recordings of your desktop activity while using our Services, which may include audio and video screen recordings, your cookies, photos, local storage, search history, advertising interactions, and keystrokes.Information from Cookies and Other Tracking Technologies. We and our third-party partners collect information using cookies, pixel tags, SDKs, or other tracking technologies. Our third-party partners, such as analytics partners, may use these technologies to collect information about your online activities over time and across different services.[...]How We Disclose the Information We CollectAffiliates.We may disclose any information we receive to any current or future affiliates for any of the purposes described in this Privacy Policy.Vendors and Service Providers. We may disclose any information we receive to vendors and service providers retained in connection with the provision of our Services.

评论 #42207499 未加载

handfuloflight6 months ago

I see it's able to perform data extraction, but what if you wanted to enter in data from another system, or generated by an LLM during the workflow?

评论 #42198758 未加载

评论 #42200153 未加载

thedays6 months ago

Is Autotab able to scrape data from multiple websites with different structures and combine this data into structured data in one CSV or JSON file? Example: scrape interest rates offered on savings accounts from multiple bank websites and extract the name of the bank, bank logo, product name and interest rate for each account and run this saved query on a regular schedule (daily, weekly etc)?

评论 #42203428 未加载

smashah6 months ago

If this was an OSS project automating a specific service many HN-ers would come and bleet about TOS violations & being scared/wary of C&Ds.How does this not violate TOS? Do you have legal protection set up from megacorps trying to bully you with legal threats?Automation despite TOS via Adversarial Interop should be a Digital Human Right. Godspeed.

评论 #42198816 未加载

diegolazcano6 months ago

This is awesome. I was just trying to get a rudimentary version of this for some "user" interaction heavy data extraction. Definitely giving it a try.For a case with lots of requests how does Autotab handle ip-blocking? Does each run use a different portal instance?

评论 #42200095 未加载

nagisa123216 months ago

Have you considered how to handle mobile verification codes, graphic verification codes, and "proving you are not a robot" verification methods?

评论 #42203455 未加载

pacifi306 months ago

Pretty slick. I recorded a session for ordering from a restaurant website, and it did repeat the entire workflow. It had some issues with a modal popped up but all in all well done! We have been trying to robotify the task of ordering from restaurant for our clients and seems like your solution can work well for us. I am guessing that you want your users to use Autotab browser, what is use for API?

评论 #42198946 未加载

评论 #42198960 未加载

treetalker6 months ago

> As it runs, Autotab asks for clarifications and feedback. These learnings are accumulated into action memory—improving Autotab's world model, and allowing it to work reliably for hours on end.Is "learning", used as a noun, a term of art in this field?If not, my reactioning to that using is that it is a being bad English that causes producings of gratings on the ears.

评论 #42202102 未加载

评论 #42202001 未加载

amarsharma6 months ago

Been working in this space for almost 9 years and written a lot of scrappers and web automations for various clients, I am really excited to build something like this too. Are you guys hiring? Would love to chat.

评论 #42211251 未加载

评论 #42203303 未加载

hmontazeri6 months ago

I don't read docs. Didn't get it to work the way I wanted... It needs simplification.

wruza6 months ago

Honestly, the video feels like just any low/nocode tutorial video in a sense “that we’re going to automate something” and a minute later we are copying urls into some complex forms and following the voiceover of something you cannot grasp the meaning of. A little intro of what exactly we are doing would help.I cold-watched only half of it, without reading any info on the project, but that’s how everyone does it, I guess.But I get the idea. Automate by example with automatic scenario builder and fuzzy matching ui via ai.As someone who works in automation, I (again, blindly) suggest looking into anti-detection and human behavior like mouse movements, typing errors and pauses, because that’s what your (and all ours) main enemy will be in the next decade.All in all, this is in high demand, afaiu. I tend to use a classic ML approach for that (avoiding browser automation cause it obviously only works in a browser and limits/divides the area of application), but would love to try something that self-heals on site changes. Although I think I’d better use something that can detect changes and reconfigure my ML params rather than using it directly, cause I don’t really trust modern AI to free-float in runtime, and also costs.

surrTurr6 months ago

MacBook Pro m3 max; latest macos version:Autotab has exited due to multiple fatal errors. Please contact support for assistance: contact@autotab.com.

评论 #42204782 未加载

linuxrebe16 months ago

One thing I would recommend. Install instructions for Linux/Windows/Mac. Not finding them in the documentation.

评论 #42207090 未加载

replwoacause6 months ago

Looks nice. Anybody else in this space? This one is on the pricier end but I’m just a single user so maybe not the target customer

评论 #42199143 未加载

评论 #42199318 未加载

评论 #42209139 未加载

评论 #42199064 未加载

N4der6 months ago

Super cool. Congrats & well done. Can I install a Chrome extension within this browser and automate some actions on it?

评论 #42203309 未加载

rno3216 months ago

Can you use this to auto apply online forms?

eddjlsh6 months ago

I tried it out on a website I am testing at work but sadly it failed to complete a form :(

评论 #42209417 未加载

artificialLimbs6 months ago

'Google SSO'Urgh. I was excited about this. Anxiously awaiting email/other SSO (we use MS).

评论 #42205873 未加载

kQsWEeE6 months ago

Hi, do you offer proxies?

评论 #42203291 未加载

sciencesama6 months ago

Is it possible to get a personal license for testing ??

评论 #42217433 未加载

myflash136 months ago

Where are the API docs / client libraries?

评论 #42204622 未加载