TE
TechEcho
Home
24h Top
Newest
Best
Ask
Show
Jobs
English
GitHub
Twitter
Home
Extracting Subset of Common Crawl Data on Laptop
1 points
by
chillaranand
over 2 years ago
1 comment
chillaranand
over 2 years ago
Each Common crawl monthly data consists of ~100 TB. For some use cases, we don't need entire data set. We just need a subset of the data.<p>In this post, lets see how we can extract sub set of the data from our laptop itself.