TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Storing millions and billions of URLs?

12 pointsby gerenukabout 7 years ago
Hello Everyone!<p>Currently, using ElasticSearch for storing the meta data and other raw data information but it is a very small scale around 500,000 domains.<p>I have been tasked to scale it to 20-40 million domains and storing their internal&#x2F;external links while building a page rank&#x2F;domain authority score for each domain which we are adding to our database.<p>What do you guys suggest&#x2F;recommend for storing this data at a very large scale as web page internal links&#x2F;external links will be stored which will lead it over 100M-1B links database?<p>Any kind of feedback&#x2F;suggestion would be appreciated.<p>Thanks.

8 comments

nik736about 7 years ago
I don&#x27;t think that any proper database technology will have issues with that amount of data. It all depends on how you use it.
sharemywinabout 7 years ago
Found this:<p><a href="https:&#x2F;&#x2F;dba.stackexchange.com&#x2F;questions&#x2F;38793&#x2F;which-database-could-handle-storage-of-billions-trillions-of-records" rel="nofollow">https:&#x2F;&#x2F;dba.stackexchange.com&#x2F;questions&#x2F;38793&#x2F;which-database...</a><p>There&#x27;s a nice little triangle diagram here: <a href="https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;2794736&#x2F;best-data-store-for-billions-of-rows" rel="nofollow">https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;2794736&#x2F;best-data-store-...</a>
girishsoabout 7 years ago
I personally have used CouchDb to store tens of millions of documents. If you can find a way get the data you want using CouchDb views, the number of documents simply doesn’t matter with CouchDb (may be just the disc usage grows with additional documents&#x2F;views). And that too with excellent performance.
drizzle87about 7 years ago
Elasticsearch should be easily able to handle your scaling needs. Why do you think that it would not? What are your concerns?
jjirsaabout 7 years ago
The answer will depend primarily on how you expect to query it.<p>Cassandra can do many orders of magnitude more than 1B, but would limit you in your query patterns.
mr__yabout 7 years ago
Have you considered sharding the data to multiple independent ES instances? Each of them could handle amount of data that does not cause problems?
cimmanomabout 7 years ago
We&#x27;ve found Elasticsearch to be quite performant with hundreds of millions of documents. What are your concerns with scaling it?
dchukabout 7 years ago
Building an ahrefs&#x2F;moz&#x2F;majestic competitor?
评论 #17000463 未加载