TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Simple API to extract web article text?

2 点作者 friendofafriend大约 1 年前
I&#x27;m working on a consumer facing project that involves analyzing text from web articles.<p>Does anyone know of an API that can handle the text extraction part automatically?<p>Ideally the API can take in a URL and just return the main text content of a website, even for sites with slightly complex layouts.<p>For example: https:&#x2F;&#x2F;www.nytimes.com&#x2F;2024&#x2F;03&#x2F;28&#x2F;technology&#x2F;personaltech&#x2F;smart-glasses-ray-ban-meta.html<p>We&#x27;re most interested in an API that has a decent free tier + usage-based pricing (at least for overages).<p>So far, most of our searches have turned up website scrapers that return HTML that needs to be further parsed (ScrapingBot, ScrapingBee, Scrapingdog, etc.), or services that are prohibitively priced (Diffbot).<p>Next, we&#x27;re looking into Apify, but maybe we&#x27;ve missed something?<p>Any recommendations would be <i>greatly</i> appreciated!

2 条评论

timoteostewart大约 1 年前
Would you consider rolling your own? Python’s goose3 has worked well for me in article extraction. It seemed to be successful more often than trafilatura and newspaper3k.
评论 #40046484 未加载
cranberryturkey大约 1 年前
Brisk.news
评论 #40028353 未加载