We’ve just moved to Snowflake. I haven’t really been impressed with some of the new features added to Redshift, it seems like too little and too late.<p>The JSON support (SUPER type) is kind of cool, and they are moving towards more “automatic” sorting + partitioning, but it’s just all a bit shit to be honest.<p>We encountered major bugs with data-sharing, our clusters keep insisting that zstd is the best compression format to use for all our data (but then never actually using it), materialised views often fail to update and understanding why is a nightmare, terrible performance if your strings are varchar(max) (guess what Glue sets them to…), Redshift data often just dies (4 hour downtime recently, no status page) and has some really weird semantics around listing queries, before the data API you couldn’t run async queries and it’s eventbridge integration straight up doesn’t work, nightmare bugs in the Java connection library that don’t show up using psql, tiny set of types (no arrays, uuids), unkillable queries, AQUA actually causing everything to slow down hugely, critical release notes posted only in a fucking random forum, etc etc.<p>Snowflake has apparently sorted this, as well as including ingestion tools (snowpipe) that you’d otherwise have to stitch together with AWS Glue or something (a cursed service if ever there was one).<p>That being said, in some cases Redshift absolutely flies. But the real world isn’t filled with ideal schemas and natural sort keys. It’s messy. And Snowflake deals with messy better.
I wish Amazon would stop naming things "serverless" that clearly have a well defined type and number of servers at any point. That includes Redshift Serverless and Aurora Serverless. If it has a cluster, it isn't serverless it is just autoscaling. Every time they announce a serverless product I'm assuming that it will be like Lambda and am mostly disappointed. For example, a real Aurora Serverless would be more like CockroachDB Cloud or DynamodDB. And a real Redshift Serverless would be more like BigQuery.
I have worked with several companies that have their infrastucture on AWS but consider BigQuery or Snowflake for the serverless model they provide. This brings RedShift much closer to those options. I envision Redshift Serverless becoming the default options for most enterprises, mainly because it stays in the AWS ecosystem and you don't have to work with a different vendor and create different cost governance processes.<p>I beleive the real advantage AWS has here is in cost. Snowflake has positioned itself as price competitive with Redshift but this is primarily due to Snowflake's ability to scale on-demand, whereas prior Redshift versions required you to size for peak usage (RA3 helped with this). In my experience Snowflake is an order of magnitude more expensive if you compare similiar workloads and do not account for idle time. We will need to see the performance of a "Redshift Processing Unit" to be sure of the advantage, but even so AWS will be able provide significant downward cost pressure through this offering.
I was initially excited about this as I think it might solve our Redshift pain points and potentially avoid us having to deal with a migration to Snowflake but then I remembered when AWS account managers promised Athena and Spectrum would solve these same problems at a previous company I worked for a few years ago. I'm assuming the developer experience will still be terrible with lots of knobs to tune to actually get any decent cost/performance.
Blog post with initial thoughts about internal design of serverless;<p><a href="https://amazonredshiftresearchproject.org/slblog/index.html" rel="nofollow">https://amazonredshiftresearchproject.org/slblog/index.html</a>
So basically BigQuery from AWS. Looks good on first sight, a bit late. Personally worked for a large org which has just moved from BigQuery from Redshift and I have to say that BigQuery is the much better product.