I don't mind if building the database takes on the order of days, but I'd like queries to be as quick as possible. If I get new data, I am fine rebuilding the database from scratch rather than inserting. (New data comes in on the order of months to years).<p>I have several related datasets that I'd like to query. The datasets are currently stored in flat files. The largest data set is 4 million rows x 20 columns, the next largest is 50000x1300 (or 650 million x 3 if "melted"). There are 5 bigger data tables, and several small metadata tables.<p>What sort of database would be best for this size and these goals? What additional information would you need before recommending a solution?
I needed a readonly key value system.<p>I ended up building out my own key value store for several million entries using a memory mapped file in Go. The total file size is about 17GB.<p>I originally wanted to use Boltdb for this, but it was taking too long to load, and the author mentioned in a github issue that just using a memory mapped file would be faster.<p>I use a dictionary of keys to offsets in the memory mapped file to perform my lookup.<p>My total loading time only takes a few minutes.<p>If you only need to query on a simple key value, this approach may work for you.
Have you considered using SQLite? [1]<p>[1]: <a href="https://www.sqlite.org/about.html" rel="nofollow">https://www.sqlite.org/about.html</a>