Show HN: Stagehand – an open source browser automation framework powered by AI

326 pointsby hackgician4 months ago

Hi HN! I’m Anirudh — longtime lurker, first time poster, and I couldn’t be more excited to show you Stagehand.Stagehand is a TypeScript project that extends Playwright with three simple AI methods — act, extract, and observe. We’d love for you to try it out using the command below:<pre><code> npx create-browser-app --example quickstart </code></pre> Here’s a sample workflow:<pre><code> const stagehand = new Stagehand(); await stagehand.init(); // Stagehand overrides the Playwright Page and Context classes const { page, context } = stagehand await page.goto("instadash.com") // Regular Playwright // Take action on the page await page.act({ action: "click on taqueria cazadores" }) // Extract relevant data from the page const { price } = await page.extract({ instruction: "extract the price of the super burrito", schema: z.object({ price: z.number() }) }) </code></pre> We built Stagehand because we loved building browser automations using Playwright and Selenium, but we grew frustrated at how cumbersome it is to just get started and write simple browser automations. These frameworks, while incredibly powerful, are built for QA testing and are thus notoriously prone to fail if there are minor changes in the UI or underlying DOM structure.The goal of Stagehand is twofold:1. Make browser automations easier to write 2. Make browser automations more resilient to DOM changes.We were super energized by what we’ve been seeing with vision-based computer use agents. We think with a browser, you can provide even richer data by leveraging the information in the DOM + a11y tree in addition to what’s rendered on the page. However, we didn’t want to go so far as to build an agent, since we wanted fine-grained control over each step that an agent can take.Therefore, the happy medium we built was to extend the existing powerful functionalities of Playwright with simple and extensible AI APIs that return the decision-making power back to the developer at each step.Check out our docs: <a href="https://docs.stagehand.dev" rel="nofollow">https://docs.stagehand.dev</a>We’d love for you to join and give us feedback on Slack as well: <a href="https://stagehand.dev/slack" rel="nofollow">https://stagehand.dev/slack</a>

21 comments

dchuk4 months ago

This looks awesome.What I would love to see either as something leveraging this, or built in to this, is if you prompt stagehand to extract data from a page, it also returns the xpath elements you'd use to re-scrape the page without having to use an LLM to do that second scraping.So basically, you can scrape new pages never before seen with the non-deterministic LLM tool, and then when you need to rescrape the page again to update content for example, you can use the cheaper old-school scraping method.Not sure how brittle this would be both going from LLM version to xcode version reliably, or how to fallback to the LLM version if your xcode script fails, but overall conceptually, being able to scrape using the smart tools but then building up basically a library of dumb scraping scripts over time would be killer.

评论 #42648948 未加载

评论 #42643257 未加载

评论 #42642469 未加载

评论 #42643334 未加载

mpalmer4 months ago

This looks very cool and makes a lot of sense, except for the idea that it should take the place of Playwright et al.Personally I'd love to use this as an intermediate workflow for producing deterministic playwright code, but it looks like this is intended for running directly.I don't think I could plausibly argue for using LLMs at runtime in our test suite at work...

评论 #42642694 未加载

评论 #42643450 未加载

评论 #42649055 未加载

评论 #42640303 未加载

asar4 months ago

This looks really cool, thanks for sharing!I recently tried to implement a workflow automation using similar frameworks that were playwright or puppeteer based. My goal was to log into a bunch of vendor backends and extract values for reporting (no APIs available). What stopped me entirely were websites that implemented an invisible captcha. They can detect a playwright instance by how it interacts with the DOM. Pretty frustrating, but I can totally see this becoming a standard as crawling and scraping is getting out of control.

评论 #42650723 未加载

z3t44 months ago

My kneejerk reflex: "create-browser-app" is a very generic name, should just have called it "stagehand"

sparab184 months ago

I've been playing around with Stagehand for a minute now, actually a useful abstraction here. We build scrapers for websites that are pretty adversarial, so having built in proxies and captcha is delightful.Do you guys ever think you'll do a similar abstraction for MCP and computer use more broadly?

评论 #42640223 未加载

xingwu4 months ago

Can the script be compiled into actual DOM operations so that we don't need LLM for every run？

tomatohs4 months ago

Cool! Before building a full test platform for testdriver.ai we made a similar sdk called Goodlooks. It didn't get much traction, but will leave it here for those interested: <a href="https://github.com/testdriverai/goodlooks">https://github.com/testdriverai/goodlooks</a>

评论 #42650513 未加载

zanesabbagh4 months ago

Have been on the Slack for a while and this crew has had an insane product velocity. Excited to see where it goes!

评论 #42642487 未加载

pryelluw4 months ago

Can it be adapted to use ollama? Seems like a good tool to setup locally as a navigation tool.

评论 #42642152 未加载

fbouvier4 months ago

Hey Anirudh, Stagehand looks awesome, congrats. Really love the focus on making browser automations more resilient to DOM changes. The act, extract, and observe methods are super clean.You might want to check out Lightpanda (<a href="https://github.com/lightpanda-io/browser">https://github.com/lightpanda-io/browser</a>). It's an open-source, lightweight headless browser built from scratch for AI and web automation. It's focused on skipping graphical rendering to make it faster and lighter than Chrome headless.

评论 #42649870 未加载

评论 #42645530 未加载

bluelightning2k4 months ago

Does this open up the possibility of automating an existing open browser tab? (Instead of a headless or specifically opened instance of chrome?)

评论 #42650281 未加载

评论 #42644043 未加载

jerrygoyal4 months ago

wow. It's like cursor vs vscode movement but for browser automation and scrapping. Kudos to the author. Are there any other similar tools?

评论 #42647313 未加载

评论 #42650449 未加载

CyberDildonics4 months ago

People must be excited for this since a lot of people are commenting for the first time in months or years to say how much they love it. Some people liked it so much they commented for the first time ever to say how great it is.

评论 #42648941 未加载

评论 #42648142 未加载

vitalets4 months ago

Looks interesting. I know about the similar project - <a href="https://zerostep.com" rel="nofollow">https://zerostep.com</a>. Is it basically the same?

jsdalton4 months ago

Does it operate by translating your higher level AI methods into lower level Playwright methods, and if so is it possible to debug the actual methods those methods were translated to?Also is there some level of deterministic behavior here or might every test run result in a different underlying command if your wording isn’t precise enough?

评论 #42642112 未加载

jameslk4 months ago

Cool to see another open source AI browser testing project! There’s a couple of others I’ve heard of as well:Skyvern: <a href="https://github.com/Skyvern-AI/skyvern">https://github.com/Skyvern-AI/skyvern</a>Shortest: <a href="https://github.com/anti-work/shortest">https://github.com/anti-work/shortest</a>I’d love to hear what makes Stagehand different and pros/ cons.Of course, I have no complaints to see more competition and open source work in this space. Keep up the great work!

评论 #42650486 未加载

righthand4 months ago

I’m curious how this compares to playwrights already built in codegen:<a href="https://playwright.dev/docs/codegen-intro" rel="nofollow">https://playwright.dev/docs/codegen-intro</a>Is a chat bot easier to reiterate a test?

评论 #42640204 未加载

BrandiATMuhkuh4 months ago

Congratulations. This is super cool.I often thought E2E testing should be done with AI. What I want is that the functionality works (e.g.: login, then start an assignment) without the need to change the test each time the UI changes.

评论 #42640195 未加载

fasten4 months ago

cool extension to playwright! how effective are the ai methods in handling dynamic ui changes?

owebmaster4 months ago

Any attempt on doing something similar but as a browser extension?

评论 #42650436 未加载

arvindsubram4 months ago

The easiest way to programmatically browse the web!!

评论 #42640226 未加载