Launch HN: Skyvern (YC S23) – open-source AI agent for browser automations

327 pointsby suchintan7 months ago

Hey HN, we’re Suchintan and Shu from Skyvern (<a href="https://www.skyvern.com">https://www.skyvern.com</a>). We’re building an open source tool to help companies automate browser-based workflows using LLMs.Our open source repo is at <a href="https://github.com/Skyvern-AI/Skyvern">https://github.com/Skyvern-AI/Skyvern</a>, and we're excited to share our cloud version with you (<a href="https://app.skyvern.com">https://app.skyvern.com</a>) :)Skyvern allows you to define a single (or a series of) goal-based prompts to instruct an agent to complete complex tasks on websites. Here’s a quick demo of Skyvern: <a href="https://www.loom.com/share/76b231309df74a528061fcf102e1967f" rel="nofollow">https://www.loom.com/share/76b231309df74a528061fcf102e1967f</a>We built this to solve a specific problem: building browser automations often requires companies to either hire people and scale out operations teams to do tedious manual work, or hire developers to use products like UI-Path or Selenium to build automations.Code-based solutions always run into the same problem: they’re brittle (wow this website added a new pop-up dialog and my script broke), and fail to achieve the same objective across multiple websites (how can I fill out a contact-us form on hundreds of different websites?)We did a Show HN a few months ago (<a href="https://news.ycombinator.com/item?id=39706004">https://news.ycombinator.com/item?id=39706004</a>), and since then, we’ve onboarded customers for a wide variety of use cases: generating insurance quotes on websites like Geico.com; applying to jobs on websites like lever.co; automating filing permits in local government portals; registering new corporations for employment identification; fetching invoices from hundreds of different portals such as hydroone.com; automating purchasing on a handful of e-commerce websites like zooplus.com; and filling out contact us forms on a bunch of random smb websites (such as HVAC websites).To be able to service all of these, we’ve built and open-sourced quite a few interesting features:(1) a fully-featured React application allowing you to see every action Skyvern is taking in real-time;(2) livestreaming browser instances to allow our users to see what Skyvern is doing when running inside of a docker container;(3) authenticated sessions, integrating with Bitwarden and allowing users to specify Email + Phone + QR-code based 2FAs;(4) “workflows” allowing users to chain multiple goal-based prompts together, which can handle tasks like invoice downloading, or automating purchasing pipelines;(5) processing HTML Elements (ex. identifying + summarizing SVGs) and performing website interactions (ex. Iterating over dynamic autocompletes to fill in address information correctly)(6) “cached workflows”, allowing Skyvern to memorize previous interactions (ie text inputs) and re-use them in future runs.We’ve also been blessed with a few model advancements to solve some of the cost concerns the community brought up. Skyvern’s token costs went down 80% from $15 / 1M tokens (GPT-4V) to $2.50 / 1M tokens (GPT-4O)Despite the model costs going down 80%, Skyvern is still quite expensive to run, so we give every new user $5 of credits to try it out and see if it can be useful for you.We would be honored if you could give it a try at <a href="https://app.skyvern.com">https://app.skyvern.com</a> and share some feedback with us, and we look forward to any and all of your comments!

31 comments

glorpsicle7 months ago

Congrats on the launch! I've been keeping up with you folks since you last posted (a few months ago, I believe). How does Anthropic's recent announcement of Claude's "computer use" abilities grab you? What key differentiators does Skyvern have, at this point in time ("computer use" with Claude being relatively new)?

评论 #41938325 未加载

评论 #41937956 未加载

sahmeepee7 months ago

Probably not the first AI wrapper around Playwright this week, and certainly not the first this month.I think this use case of automation in a BPA sense is more compelling than using it for test automation, because the latter is much more concerned with the precision and repeatability of the process. For the BPA task, arguably you care only about the outcome and it often doesn't matter if it gets there via some crazy route.Part of the problem for me is that your example video shows a big wodge of prompt that had to be written to make this work and then a few kb of payload data (parameters) in a plaintext, non-csv format. If the expectation is that this replaces someone just using Playwright with codegen due to that being too technical, I'm not convinced there is a huge group of people who can manage one task but not the other.Furthermore, you are expecting them to pass over their website login credentials and apparently their credit card details too, in plain text. You had better have a very solid idea of how to handle that sensitive data to avoid serious consequences if your users' skyvern accounts are compromised.I think the frequency of website redesigns is oversold by people producing these LLM-driven Playwright wrappers, especially when targeting old-fashioned or government sites. As an example, we have had a suite of lengthy Playwright browser automations to interact with a government site for a few years and have had to maintain them only once, when the agency's business process changed. The prompt would also have needed to change had we used Skyvern, as would the payload, because the process was different. The difference with the Playwright automation, though, is that we could use assertions to verify steps had succeeded/failed and data had been recorded correctly, so we would know the process needed updating. I can't see that option in Skyvern which would have me worrying that process changes would be overlooked and we would unknowingly start entering the wrong data or missing steps.

评论 #41941330 未加载

Workaccount27 months ago

Anyone building a start-up on 3rd party LLMs at this point has to have some big cajones. Or you need a smash-and-grab business model. Serious risk if your horizon is measured in years instead of months.Anthropic threw their hat in this ring yesterday, and it will very likely be followed by OpenAI and Google soon. Godspeed.

评论 #41942360 未加载

评论 #41940246 未加载

mmaunder7 months ago

Congrats!!! And super cool that you've open sourced it under the AGPL. Sorry if this is answered in the docs but I did a brief search on the source and noticed you're not using LangChain but do plan to integrate it so it can be offered to that community. I'm curious if you wouldn't mind talking about what you did use to create the chain of thought/actions logic in Skyvern and if you had to start work today if you'd consider going the LangChain/Graph route? Thanks.

评论 #41937471 未加载

dboreham7 months ago

In case anyone else is confused as to what "browser automations" is : this is about making a program that drives a target web site (owned by someone else typically), in the manner of selenium or the like --- inserting key press events and mouse move/click events, to make that target web site do something. Once you know that the rest of the description makes sense.

sirmarksalot7 months ago

As with any of these LLM workflow automation tools, it raises a few questions about each potential use case, and the likely long-term outcomes.1. Is this working around friction due to a lack of interoperability between tools? For example, is this something that would be more efficient if the owner of the website exposed a REST service? Will the existence of this tool disincentivize companies from exposing services when it makes sense?2. If there is a good reason for the lack of a service endpoint, perhaps for security reasons, will your automation workflow be used to bypass those security measures? Could your tool be used by malicious actors to disable major services? Are you that malicious actor yourself? Will your tool be used by scalpers to prevent consumers from buying high-demand products?3. If this is being used to work around deferred maintenance with internal tools and processes, will the existence of these kind of tools be used by management to justify further deferral of that maintenance? Will your tool become a critical piece of the support staff's workflow?4. If your tool is being used in good faith to work around anti-patterns in website design, will the owner of the website be incentivized to break your workflow? Is your use case just a step in an arms race?These are the thoughts that go through my head whenever I hear about software being laid on top of complicated processes, where instead of simplifying the underlying processes, we add another layer of complexity to sweep it under the rug. I'm sure that people will find your project useful, but I wonder what the longer-term effects will be.

评论 #41940286 未加载

thedays7 months ago

Is Skyvern able to scrape data from multiple websites with different structures and combine this data into structured data in one CSV or JSON file? Example: scrape interest rates offered on savings accounts from multiple bank websites and extract the name of the bank, bank logo, product name and interest rate for each account and run this saved query on a regular schedule (daily, weekly etc)?

评论 #41947034 未加载

DennisSFO7 months ago

Congrats on the launch. I'm curious if you had any experience running skyvern on airline websites (for example to extract award availability for miles tickets from point A to B)? It seems like airlines always change things around and have robust anti scraping measures.

评论 #41937519 未加载

msp267 months ago

Awesome, I've been working on a similar thing at a smaller scale and I think this area is very promising.I've limited my problem scope to single page interactions / scraping which has been very reliable and useful for my company. But agentic automation does sound fun.

评论 #41937550 未加载

sergeyk7 months ago

Congrats! Do you have numbers on WebArena (<a href="https://webarena.dev" rel="nofollow">https://webarena.dev</a>) or VisualWebArena (<a href="https://jykoh.com/vwa" rel="nofollow">https://jykoh.com/vwa</a>)?

评论 #41938399 未加载

modo_7 months ago

Congrats on the launch! This is really cool - one of the applications of LLM I find most compelling. I've seen so many back office processes that have hundreds of steps, are incredibly error prone, and traditionally couldn't be automated due to API limitations. Solutions like Skyvern are going to supercharge businesses that have had historically low margins due to the number of humans required. (Not as a replacement for a human, but as a force multiplier)

评论 #41937529 未加载

hannesle7 months ago

Hi, looks cool! Congratulations. Will check it out and maybe add it to <a href="https://ai-tools.directory" rel="nofollow">https://ai-tools.directory</a> for people looking for such solutions!

drewsonian7 months ago

This is great, and I can think of several business uses and some personal.Like this: Could I use this to pull screenshots or PDFs of my grocery receipts from a major grocery chain?

评论 #41937990 未加载

delusional7 months ago

The plaintext version of your signup email replaces the ampersand in the url with an &amp; XML entity. You probably don't want that.

评论 #41937532 未加载

jackb40407 months ago

> You won't be able to run Skyvern unless you enable at least one provider.Any plans on bundling a local LLM / supporting local LLMs?

评论 #41939149 未加载

ganeshkrishnan7 months ago

awesome work. I had the github starred from the day I saw on Show HN but never got around to using it.I want to use this to automate approving/declining group members for our facebook group which is approaching half million members and fb admin tools are pretty lacking

评论 #41937541 未加载

imp0cat7 months ago

> how can I fill out a contact-us form on hundreds of different websites?What's the use case here exactly? Sorry for being a bit pessimistic, but this sounds like an easy way to automatically send a lot of spam.

BrandiATMuhkuh7 months ago

Congratulations on the launch. This is really cool. I was recently tinkering with the same idea. But based on a browser extension.There are many back office tasks where people copy data from page 1 into a form of page 2.

评论 #41937190 未加载

bluerooibos7 months ago

Looks super interesting!Unfortunately the mobile experience is pretty bad - practically unusable. I'd expect any web application made in the last decade to be mobile-first.

评论 #41948894 未加载

TZubiri7 months ago

Sounds good.Question, if it's computer vision based, does that mean that it can be trivially ported to support desktop automations?

andychert7 months ago

Do I understand correctly that this is an open source of the GUI only, you don't show the model itself?

评论 #41938768 未加载

ProofHouse7 months ago

Cool but pricing is utterly insane

shaburn7 months ago

Would be great to have a fixed blockchain based event log, ideally encrypted.

infocollector7 months ago

Quick question: What does DataDog's ddtrace do in the opensource version?

评论 #41937970 未加载

rokhayakebe7 months ago

Can I use this to make changes to a Wordpress website if given login?

评论 #41939412 未加载

drippingfist7 months ago

This is very cool. Do you think I could use to do UX/UI testing?

评论 #41938797 未加载

tdsone37 months ago

Has someone run this on modal.com yet?

Cheesman1237 months ago

Congrats on the launch - love the tool

PeterStuer7 months ago

But will Cloudflare brick it?

ji_zai7 months ago

Congrats!! This is super neat. I've been looking for good ways to have AI browse the internet on my behalf - the way I normally do, and give me a presentation / summary of the highlights, so that I don't have to open myself up as much to social media and the chance for doomscrolling, etc.I'm going to be playing with this.

bbor7 months ago

Wait until the media gets wind of what the industries been doing this fall… a whole repo on using AI to autonomously use other people’s websites, and not a single paragraph on safety — for the websites or for us. Technically incredible ofc, and it’s a beautiful repo. I wish it didn’t make so anxious.

评论 #41942947 未加载