TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Extracting Subset of Common Crawl Data on Laptop

1 pointsby chillaranandover 2 years ago

1 comment

chillaranandover 2 years ago
Each Common crawl monthly data consists of ~100 TB. For some use cases, we don&#x27;t need entire data set. We just need a subset of the data.<p>In this post, lets see how we can extract sub set of the data from our laptop itself.