TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A library to easily scrape metadata from an article on the web

2 点作者 Sharma大约 5 年前

1 comment

bryanrasmussen大约 5 年前
hah, I recently had a project with an ex-cofounder (non-technical) where he wanted me to make a scraper for some sites.<p>He said he wanted to scrape the metadata, he suggested this library even, I could hardly see what I needed to do and why his current programmers couldn&#x27;t do it.<p>After going back and forth for what seemed like infinity it turns out he didn&#x27;t want metadata and when he used terms like title he meant an h1 at a particular position on some pages, and a div on other pages, and description was a div sibling to the h1 on site 1 but a span with a randomly generated id on another site etc. etc.<p>It was easy enough to do in the end, as is often the case the difficulty was in communication, but it did highlight one point - generally the metadata of a modern web page is not that interesting to scrape.