1.1B records = 500GB of raw CSV data.
This fits into RAM quite easily on a machine like the P2.8xlarge, especially when compression is used (like MapD uses).<p>I'd like to see how well this performs on a dataset that doesn't fit in the RAM.
Is there a bridge for using MapD with Spark interface or somehow combining them? This can be interesting for the clusters with a lot of GPUs and a lot of data to do data manipulations.
Found elsewhere on the internet: 'On a system with eight Tesla K80s, which might cost somewhere between $60,000 to $70,000, the license for the MapD stack would be “a small multiple” of this hardware cost.'<p>I guess I'm not playing with this anytime soon.
Amazing to see the improvements that MapD has made over the past few years. I have been following them for a long time, and was excited to catch wind of 3.0 this morning. Then I get on here to see someone already benchmarking and working with it.
the benchmark page [1] mark did a good summary of how various technologies compare.<p>[1] <a href="http://tech.marksblogg.com/benchmarks.html" rel="nofollow">http://tech.marksblogg.com/benchmarks.html</a>
Price comparison between Amazon Redshift, Google BigQuery, ElasticSearch and SlicingDice using the same dataset:<p><a href="https://blog.slicingdice.com/slicingdice-pricing-model-and-competitors-comparison-31f1c9f0f076" rel="nofollow">https://blog.slicingdice.com/slicingdice-pricing-model-and-c...</a>