TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Simple API to extract web article text?

2 pointsby friendofafriendabout 1 year ago
I&#x27;m working on a consumer facing project that involves analyzing text from web articles.<p>Does anyone know of an API that can handle the text extraction part automatically?<p>Ideally the API can take in a URL and just return the main text content of a website, even for sites with slightly complex layouts.<p>For example: https:&#x2F;&#x2F;www.nytimes.com&#x2F;2024&#x2F;03&#x2F;28&#x2F;technology&#x2F;personaltech&#x2F;smart-glasses-ray-ban-meta.html<p>We&#x27;re most interested in an API that has a decent free tier + usage-based pricing (at least for overages).<p>So far, most of our searches have turned up website scrapers that return HTML that needs to be further parsed (ScrapingBot, ScrapingBee, Scrapingdog, etc.), or services that are prohibitively priced (Diffbot).<p>Next, we&#x27;re looking into Apify, but maybe we&#x27;ve missed something?<p>Any recommendations would be <i>greatly</i> appreciated!

2 comments

timoteostewartabout 1 year ago
Would you consider rolling your own? Python’s goose3 has worked well for me in article extraction. It seemed to be successful more often than trafilatura and newspaper3k.
评论 #40046484 未加载
cranberryturkeyabout 1 year ago
Brisk.news
评论 #40028353 未加载