TE
科技回声
首页
24小时热榜
最新
最佳
问答
展示
工作
中文
GitHub
Twitter
首页
Extracting Subset of Common Crawl Data on Laptop
1 点
作者
chillaranand
超过 2 年前
1 comment
chillaranand
超过 2 年前
Each Common crawl monthly data consists of ~100 TB. For some use cases, we don't need entire data set. We just need a subset of the data.<p>In this post, lets see how we can extract sub set of the data from our laptop itself.