TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A library to easily scrape metadata from an article on the web

2 pointsby Sharmaalmost 5 years ago

1 comment

bryanrasmussenalmost 5 years ago
hah, I recently had a project with an ex-cofounder (non-technical) where he wanted me to make a scraper for some sites.<p>He said he wanted to scrape the metadata, he suggested this library even, I could hardly see what I needed to do and why his current programmers couldn&#x27;t do it.<p>After going back and forth for what seemed like infinity it turns out he didn&#x27;t want metadata and when he used terms like title he meant an h1 at a particular position on some pages, and a div on other pages, and description was a div sibling to the h1 on site 1 but a span with a randomly generated id on another site etc. etc.<p>It was easy enough to do in the end, as is often the case the difficulty was in communication, but it did highlight one point - generally the metadata of a modern web page is not that interesting to scrape.