Show HN: Finic – Open source platform for building browser automations

143 pointsby jasonwcfan8 months ago

Last year we launched a project called Psychic that did moderately well on hacker news, but was a commercial failure. We were able to find customers, but none with compelling and overlapping use cases. Everyone who was interested was too early to be a real customer.This was our launch: <a href="https://news.ycombinator.com/item?id=36032081">https://news.ycombinator.com/item?id=36032081</a>We recently decided to revive and rebrand the project after seeing a sudden spike in interest from people who wanted to connect LLMs to data - but specifically through browsers. It's also a problem we've experienced firsthand, having built scraping features into Psychic and previously working on bot detection at Robinhood.If you haven’t built a web scraper or browser automation before, you might assume it’s very straightforward. People have been building scrapers for as long as the internet has existed, so there must be many tools for the job.The truth is that web scraping strategies need to constantly adapt as web standard change, and as companies that don’t want to be scraped adopt new technologies to try and block it. The old standards never completely go away, so the longer the internet exists, the more edge cases you’ll need to account for. This adds up to a LOT of infrastructure that needs to be set up and a lot of schlep developers have to go through to get up and running.Scraping is no easier today than it was 10 years ago - the problems are just different.Finic is an open source platform for building and deploying browser agents. Browser agents are bots deployed to the cloud that mimic the behaviour of humans, like web scrapers or remote process automation (RPA) jobs. Simple examples include scripts that scrape static websites like the SEC's EDGAR database. More complex use cases include integrating with legacy applications that don’t have public APIs, where the best way to automate data entry is to just manipulate HTML selectors (EHRs for example).Our goal is to make Finic the easiest way to deploy a Playwright-based browser automation. With this launch, you can already do so in just 4 steps. Check out our docs for more info: <a href="https://docs.finic.io/quickstart" rel="nofollow">https://docs.finic.io/quickstart</a>

14 comments

ghxst8 months ago

Cool service but how will you deal / how do you plan to deal with anti scraping and anti bot services like Akamai, Arkose, Cloudflare, DataDome etc.? Automation of the web isn't solved by another playwright or puppeteer abstraction, you need to solve more fundemental problems in order to mitigate the issues you run into at scale.

评论 #41568689 未加载

suriya-ganesh8 months ago

I've been working on browser agent the last week[1]. So this is very exciting. There are also browser agent implementations like Skyvern[2] (Also YC backed) ,or Tarsier[3] Seems like, finic is providing a way to scale/schedule these agents? If that's the case what's the advantage over something like airflow or windmill ?If I remember correctly, Skyvern also has an implementation of scaling these browser tasks built in.ps. Is it not called Robotic Process Automation? First time I'm hearing it as Remote process Automation.[1]<a href="https://github.com/ProductLoft/arachne">https://github.com/ProductLoft/arachne</a>[2]<a href="https://www.skyvern.com/">https://www.skyvern.com/</a>[3]<a href="https://github.com/reworkd/tarsier">https://github.com/reworkd/tarsier</a>

评论 #41593791 未加载

评论 #41596197 未加载

评论 #41593797 未加载

dataviz10008 months ago

I build browser automation systems with either Playwright or Chrome Extensions. The biggest issue with automating 3rd party websites is knowing when the 3rd party developer pushes changes which break the automation. The way I dealt with that is run a headless browser in the cloud which checks the behavior of the automated site periodically sending emails and sms messages when it breaks.If you don't already have this feature for your system, I would recommend it.

评论 #41572063 未加载

评论 #41569504 未加载

Oras8 months ago

Don't take this as a negative thing, but I'm confused. Is it a playwright? Is it a residential proxy? It's not clear from your video.

评论 #41568218 未加载

mdaniel8 months ago

> Finic uses Playwright to interact with DOM elements, and recommends BeautifulSoup for HTML parsing.I have never, ever understood anyone who goes to the trouble of booting up a browser, and then uses a python library to do static HTML parsingAnyway, I was surfing around the repo trying to find what, exactly "Safely store and access credentials using Finic’s built-in secret manager" means

评论 #41569995 未加载

评论 #41570176 未加载

评论 #41570485 未加载

krick8 months ago

Does anyone know solid (not SaaS, obviously) solution for scraping these days? It's getting pretty hard to get around some pretty harmless cases (like bulk-downloading MY OWN gpx tracks from some fucking fitness-watch servers), with all these js tricks, countless redirects, cloudflare and so on. Even if you already have the cookies, getting non-403 response to any request is very much not trivial. I feel like it's time to upgrade my usual approach of python requests+libxml, but I don't know if there is a library/tool that solves some of the problems for you.

评论 #41575310 未加载

评论 #41576806 未加载

评论 #41574953 未加载

评论 #41575257 未加载

评论 #41575339 未加载

评论 #41575498 未加载

评论 #41572444 未加载

评论 #41576673 未加载

评论 #41572378 未加载

评论 #41572428 未加载

whatnotests28 months ago

With agents like Finic, soon the web will be built for agents, rather than humans.I can see a few years from now almost all web traffic is agents.

评论 #41567942 未加载

j0r0b08 months ago

Thank you for sharing!Your sign up flow might be broken. I tried creating an account (with my own email), received the confirmation email, but couldn't get my account to be verified. I get "Email not confirmed" when I try to log in.Also, the verification email was sent from accounts@godealwise.com, which is a bit confusing.

评论 #41568278 未加载

评论 #41574008 未加载

skeptrune8 months ago

I wonder if there are hidden observality problems with scraping with ideal solutions of a different shape than a dashboard. Feels like sentry connection or other common alert monitoring solutions would combine well with the LLM proposed changes and help trams react more quickly to pipeline problems.

评论 #41569531 未加载

computershit8 months ago

First, nice work. I'm certainly glad to see such a tool in this space right now. Besides a UI, what does this provide that something like Browserless doesn't?

评论 #41568525 未加载

ushakov8 months ago

I do not understand what this actually is. Any difference between Browserbase and what you’re building?Also, curious why your unstructured idea did not pan out?

评论 #41568949 未加载

ilrwbwrkhv8 months ago

Backed by YC = Not open source. Eventually pressure to exit and hyper scale will take over.

评论 #41569077 未加载

评论 #41568694 未加载

slewis8 months ago

Is it stateful? Like can I do a run, read the results, and then do another run from that point?

评论 #41568857 未加载

sebmellen8 months ago

We use <a href="https://windmill.dev">https://windmill.dev</a> which is great for this!