Hello Everyone!<p>Currently, using ElasticSearch for storing the meta data and other raw data information but it is a very small scale around 500,000 domains.<p>I have been tasked to scale it to 20-40 million domains and storing their internal/external links while building a page rank/domain authority score for each domain which we are adding to our database.<p>What do you guys suggest/recommend for storing this data at a very large scale as web page internal links/external links will be stored which will lead it over 100M-1B links database?<p>Any kind of feedback/suggestion would be appreciated.<p>Thanks.
Found this:<p><a href="https://dba.stackexchange.com/questions/38793/which-database-could-handle-storage-of-billions-trillions-of-records" rel="nofollow">https://dba.stackexchange.com/questions/38793/which-database...</a><p>There's a nice little triangle diagram here:
<a href="https://stackoverflow.com/questions/2794736/best-data-store-for-billions-of-rows" rel="nofollow">https://stackoverflow.com/questions/2794736/best-data-store-...</a>
I personally have used CouchDb to store tens of millions of documents. If you can find a way get the data you want using CouchDb views, the number of documents simply doesn’t matter with CouchDb (may be just the disc usage grows with additional documents/views). And that too with excellent performance.
The answer will depend primarily on how you expect to query it.<p>Cassandra can do many orders of magnitude more than 1B, but would limit you in your query patterns.