TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Exploring five million Hacker News posts

3 pointsby lmcinnes7 months ago
This is a data map providing a view of all Hacker News stories with a score of at least 2 (to remove most of the spam). Stories are close together in the map if they have semantically similar titles. In the bottom left is a histogram of stories over time. Hovering on a bar will select stories from that year, and dragging a selection allows selecting multiple years. A keyword based text search is in the upper left. Hold down the shift key and drag to lasso-select points and get a word cloud generated from the selection. Clicking on a point will open the URL for that story.<p>The dataset was filtered from <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;OpenPipe&#x2F;hacker-news" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;OpenPipe&#x2F;hacker-news</a> Stories were embedded in a vector space via nomic-embed: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;nomic-ai&#x2F;nomic-embed-text-v1.5" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;nomic-ai&#x2F;nomic-embed-text-v1.5</a> A 2D representation was generated using UMAP: <a href="https:&#x2F;&#x2F;github.com&#x2F;lmcinnes&#x2F;umap">https:&#x2F;&#x2F;github.com&#x2F;lmcinnes&#x2F;umap</a> Clusters were generated and topics named via HDBSCAN and Toponymy using Cohere Command-R: - <a href="https:&#x2F;&#x2F;github.com&#x2F;TutteInstitute&#x2F;fast_hdbscan">https:&#x2F;&#x2F;github.com&#x2F;TutteInstitute&#x2F;fast_hdbscan</a> - <a href="https:&#x2F;&#x2F;github.com&#x2F;TutteInstitute&#x2F;toponymy">https:&#x2F;&#x2F;github.com&#x2F;TutteInstitute&#x2F;toponymy</a> - <a href="https:&#x2F;&#x2F;cohere.com&#x2F;command" rel="nofollow">https:&#x2F;&#x2F;cohere.com&#x2F;command</a> The interactive map was generated using DataMapPlot: <a href="https:&#x2F;&#x2F;github.com&#x2F;TutteInstitute&#x2F;datamapplot">https:&#x2F;&#x2F;github.com&#x2F;TutteInstitute&#x2F;datamapplot</a><p>The map provides a great way to get an overview of Hacker News stories over the years, and to explore them, and find interesting niche topics. There are limitations to both the text embedding and the 2D representation. For example posts about John Gruber&#x27;s &quot;Daring Fireball&quot; end up in &quot;Sun-related phenomena&quot; in the Astronomy region of the map; and some topics get squashed into odd places because of the limits of a 2D representation. Nonetheless, most topics, regions and stories are well placed. There is a wealth of knowledge and information packed in here, and a lot to explore.

no comments

no comments