Show HN: Lightpanda, an open-source headless browser in Zig

319 pointsby fbouvier4 months ago

We’re Francis and Pierre, and we're excited to share Lightpanda (<a href="https://lightpanda.io" rel="nofollow">https://lightpanda.io</a>), an open-source headless browser we’ve been building for the past 2 years from scratch in Zig (not dependent on Chromium or Firefox). It’s a faster and lighter alternative for headless operations without any graphical rendering.Why start over? We’ve worked a lot with Chrome headless at our previous company, scraping millions of web pages per day. While it’s powerful, it’s also heavy on CPU and memory usage. For scraping at scale, building AI agents, or automating websites, the overheads are high. So we asked ourselves: what if we built a browser that only did what’s absolutely necessary for headless automation?Our browser is made of the following main components:- an HTTP loader- an HTML parser and DOM tree (based on Netsurf libs)- a Javascript runtime (v8)- partial web APIs support (currently DOM and XHR/Fetch)- and a CDP (Chrome Debug Protocol) server to allow plug & play connection with existing scripts (Puppeteer, Playwright, etc).The main idea is to avoid any graphical rendering and just work with data manipulation, which in our experience covers a wide range of headless use cases (excluding some, like screenshot generation).In our current test case Lightpanda is roughly 10x faster than Chrome headless while using 10x less memory.It's a work in progress, there are hundreds of Web APIs, and for now we just support some of them. It's a beta version, so expect most websites to fail or crash. The plan is to increase coverage over time.We chose Zig for its seamless integration with C libs and its comptime feature that allow us to generate bi-directional Native to JS APIs (see our zig-js-runtime lib <a href="https://github.com/lightpanda-io/zig-js-runtime">https://github.com/lightpanda-io/zig-js-runtime</a>). And of course for its performance :)As a company, our business model is based on a Managed Cloud, browser as a service. Currently, this is primarily powered by Chrome, but as we integrate more web APIs it will gradually transition to Lightpanda.We would love to hear your thoughts and feedback. Where should we focus our efforts next to support your use cases?

20 comments

fbouvier4 months ago

Author here. The browser is made from scratch (not based on Chromium/Webkit), in Zig, using v8 as a JS engine.Our idea is to build a lightweight browser optimized for AI use cases like LLM training and agent workflows. And more generally any type of web automation.It's a work in progress, there are hundreds of Web APIs, and for now we just support some of them (DOM, XHR, Fetch). So expect most websites to fail or crash. The plan is to increase coverage over time.Happy to answer any questions.

评论 #42814724 未加载

评论 #42817053 未加载

评论 #42815546 未加载

评论 #42814730 未加载

评论 #42818771 未加载

评论 #42820506 未加载

评论 #42863671 未加载

评论 #42815780 未加载

评论 #42814673 未加载

评论 #42814729 未加载

评论 #42818490 未加载

评论 #42815872 未加载

frankgrecojr4 months ago

The hello world example does not work. In fact, no website I've tried works. It's usually always panics. For the example in the readme, the errors are:```./lightpanda-aarch64-macos --host 127.0.0.1 --port 9222info(websocket): starting blocking worker to listen on 127.0.0.1:9222info(server): accepting new conn...info(server): client connectedinfo(browser): GET <a href="https://wikipedia.com/" rel="nofollow">https://wikipedia.com/</a> 200info(browser): fetch <a href="https://wikipedia.com/portal/wikipedia.org/assets/js/index-24c3e2ca18.js" rel="nofollow">https://wikipedia.com/portal/wikipedia.org/assets/js/index-2...</a>: http.Status.okinfo(browser): eval script portal/wikipedia.org/assets/js/index-24c3e2ca18.js: ReferenceError: location is not definedinfo(browser): fetch <a href="https://wikipedia.com/portal/wikipedia.org/assets/js/gt-ie9-ce3fe8e88d.js" rel="nofollow">https://wikipedia.com/portal/wikipedia.org/assets/js/gt-ie9-...</a>: http.Status.okerror(events): event handler error: error.JSExecCallbackinfo(events): event handler error try catch: TypeError: Cannot read properties of undefined (reading 'length')info(server): close cmd, closing conn...info(server): accepting new conn...thread 5274880 panic: attempt to use null valuezsh: abort ./lightpanda-aarch64-macos --host 127.0.0.1 --port 9222```

评论 #42818479 未加载

评论 #42820633 未加载

评论 #42819267 未加载

psanchez4 months ago

I think this is a really cool project. Scrapping aside, I would definitely use this with playwright for end2end tests if it had 100% compatibility with chrome and ran with a fraction of the time/memory.At my company we have a small project where we are running the equivalent of 6.5 hours of end2end tests daily using playwright. Running the tests in parallel takes around half an hour. Your project is still in very early stages, but assuming 10x speed, that would mean we could pass all our tests in roughtly 3 min (best case scenario).That being said, I would make use of your browser, but would likely not make use of your business offering (our tests require internal VPN, have some custom solution for reporting, would be a lot of work to change for little savings; we run all tests currently in spot/preemptible instances which are already 80% cheaper).Business-wise I found very little info on your website. "4x the efficiency at half the cost" is a good catch phrase, but compared to what? I mean, you can have servers in Hetzner or in AWS and one is already a fraction of the cost of the other. How convenient is to launch things on your remote platform vs launch them locally or setting it up? does it provide any advantages in the case of web scrapping compared to other solutions? how parallelizable is it? Do you have any paying customers already?Supercool tech project. Best of luck!

评论 #42820670 未加载

weinzierl4 months ago

If I don't need JavaScript or any interactivity, just modern HTML + modern CSS, is there any modern lightweight renderer to png or svg?Something in the spirit of wkhtmltoimage or WeasyPrint that does not require a full blown browser but more modern with support of recent HTML and CSS?In a sense this is Lightpanda's complement to a "full panda". Just the fully rendered DOM to pixels.

评论 #42816237 未加载

dang4 months ago

(This was on the frontpage as <a href="https://news.ycombinator.com/item?id=42812859">https://news.ycombinator.com/item?id=42812859</a> but someone pointed out to me that it had been a Show HN a few weeks ago: <a href="https://news.ycombinator.com/item?id=42430629">https://news.ycombinator.com/item?id=42430629</a>, so I've made a fresh copy of that submission and moved the comments hither. I hope that's ok with everyone!)

cropcirclbureau4 months ago

Pretty cool. Do you have a list of features you plan to support and plan to cut? Also, how much does this differ from the DOM impls that test frameworks use? I recall Jest or someone sporting such a feature.

评论 #42814894 未加载

gwittel4 months ago

Interesting. Looks really neat! How do you deal with anti bot stuff like Fingerprintjs, Cloudflare turnstile, etc? Maybe you’re new enough to not get flagged but I find this (and CDP) a challenge at times with these anti-bot systems.

zlagen4 months ago

what do you think would be the use cases for this project? being lightweight is awesome but usually you need a real browser for most use cases. Testing sites and scraping for example. It may work for some scraping use cases but I think that if the site uses any kind of bot blocking this is not going to cut it.

评论 #42817824 未加载

m3kw94 months ago

How does this work because the browser needs to render a page and the vision model needs to know where a button is, so it still needs to see an image. How does headless make it easier?

评论 #42814714 未加载

cratermoon4 months ago

So is this the scraper we need to block? <a href="https://news.ycombinator.com/item?id=42750420">https://news.ycombinator.com/item?id=42750420</a>

评论 #42815905 未加载

Kathc4 months ago

An open-source browser built from scratch is bold. What inspired the development of Lightpanda?

评论 #42814928 未加载

评论 #42815319 未加载

zelcon4 months ago

Why didn't you just fork Chromium and strip out the renderer? This is guaranteed to bitrot when the web standards change unless you keep up with it forever and have perpetual funding. Yes, modifying Chromium is hard, but this seems harder.

评论 #42820583 未加载

评论 #42819675 未加载

评论 #42819495 未加载

the__alchemist4 months ago

I have a meta question from browsing the repo: Why do C, C++, and Zig code bases, by convention, include a license at the top of every module" IMO it makes more sense to insetead include of an overview of the module's purpose, and how it fits in with the rest of the program, and one license at the top-level, as the project already has.

评论 #42818972 未加载

evanjrowley4 months ago

I'm interested to see if this could be made to work as a drop-in replacement for the headless Chromium that Hoarder uses to archive web content. I don't have a problem with the current Hoarder solution, but it would be nice to use something that requires less RAM.

surfmike4 months ago

Another browser in this space is <a href="https://ultralig.ht/" rel="nofollow">https://ultralig.ht/</a>, it's geared for in-game UI but I wonder how easy it would be to retool it for a similar use case.

kavalg4 months ago

Why AGPL? I am not blaming you. I am just curious about the reasoning behind your choice.

评论 #42815694 未加载

optixyt4 months ago

The second social media botters find this.

randomMatrix1014 months ago

Very cool project, congrats guys!

stuckkeys4 months ago

How does it do against captchas?

monkmartinez4 months ago

This is pretty neat, but I have to ask; Why does everyone want to build and/or use a headless browser?When I use pyautogui and my desktop chrome app I never have problems with captchas or trigger bot detectors. When I use a "headless" playwright, selenium, or puppeteer, I almost always run into problems. My conclusion is that "headless" scrapping creates more problems than it solves. Why don't we use the chrome, firefox, safari, or edge that we are using on a day to day basis?

评论 #42815492 未加载

评论 #42815361 未加载