TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Sonata – Declarative Web Scraping

2 pointsby camjw4 months ago
Hey everyone,<p>Sonata is a tool I&#x27;ve been working on for making web scraping easier. The idea is to make web scraping declarative i.e. to provide a service that lets you say &quot;get me this data from these URLs&quot; rather than imperative i.e. writing a puppeteer script that describes the steps to do it.<p>What this means in practice is you give Sonata a few URLs and a JSON schema that describes the data you want, and then under the hood we use LLMs to take your input and spit out a compiled scraper (basically a python script) that captures your data from the URLs. You can then use this on other similar URLs. So for example, if you wanted to scrape product information from an ecommerce site you could give it three URLs from that site, a JSON schema, and then use that scraper on other URLs from the same site.<p>The advantages of this approach are that:<p>- You don&#x27;t have to faff about with puppeteer, writing scraping scripts, etc.<p>- This scrapers we make are self-healing i.e. if the website changes we can recompile the scraper for you without you having to worry about it.<p>- Compiling the scrapers takes a few minutes, but once it&#x27;s compiled there&#x27;s no waiting for LLMs, its as fast as regular python + HTTP calls.<p>- We also handle proxies, schedules, all the normal scraping stuff.<p>We have a few users at the moment but would love to get some feedback on the value prop and whether this sounds useful!<p>Thanks,<p>Cameron

no comments

no comments