We're hugely excited about this for SnowPlow (<a href="https://github.com/snowplow/snowplow" rel="nofollow">https://github.com/snowplow/snowplow</a>) - Redshift Postgres is a really attractive storage target for eventstream data. Bit of a shame they don't support hstore/JSON yet but hopefully that will come in time.<p>We're going to work on SnowPlow-Redshift integration next week, using the COPY command + SnowPlow S3 event files. It's great timing as we've been hitting the limits of what we can do in Infobright (which inherits MySQL's limit of 65532 bytes per row - an unfortunate restriction for a columnar database).
I think this is a smart move. I know companies who are doing their custom data warehousing using Infobright (another column store database), the free version. I am sure they will be interested to dump a lot of custom scripts and do all their querying on Amazon since their data is there anyway.
Initially I was really excited by Redshift, but when I got a chance to play with it I found out that there is no JDBC support for any kind of bulk insert or trickle loading.<p>The Postgres JDBC driver when you try and do batch inserts runs each statement individually and you end up inserting 10s of rows a second.<p>I wish they had gone with something like Vertica.
What are the best options for clickstream tracking for storing in a data warehouse?<p>I've looked at Snowplow (<a href="https://github.com/snowplow/snowplow" rel="nofollow">https://github.com/snowplow/snowplow</a>) -- is that what most people are using, are you rolling your own, etc?
What's the difference between this and Amazon's RDS or S3? Is it just data storage with an easy way to query the for-mentioned data?<p>Seems like an "odd" product that kind of compete's with Amazon's existing offerings in many ways...
How is this different/better than Google BigQuery?<p>How does speed/performance compare to something like that shown here:<p><a href="https://cloud.google.com/bigquery-tour" rel="nofollow">https://cloud.google.com/bigquery-tour</a>