Introducing the caveats of PostgreSQL arrays is informative. However, I can't help but note that this app probably would have been better served by a classic event log table. Why have each user forever carrying tens, or hundreds of thousands of events directly in their user record?<p>If data mining is necessary, it's <i>right there</i> in the event log and can be manipulated by much simpler (and faster, possibly by orders of magnitude) SQL. Additionally, it's not bloating regular access of the user records.<p>I've been seeing these designs more and more, recently. Is it a natural extension of non-DBAs taking DDL design roles? Because this data manipulation style doesn't really mix well with any RDBMS I'm familiar with.
I find it very interesting to see how seemingly minor rewrites can have huge performance impacts.<p>I do have a question about the SQL style. I work in business intelligence, so I write a pretty good amount of SQL, and get to see a lot of other people's efforts, ranging from newbie analysts' first fumblings to experienced DBAs' queries and procedures. It seems that nested subqueries in the style of this article are used much more often than CTEs, and I'm curious why this is.<p>* I find CTEs much more readable than subqueries
* For naive uses, they are optimized the same
* For complex uses (e.g. recursion to traverse a parent-child hierarchy), CTEs are the only option<p>The only good reason I see for subqueries is if you are using an old version of your RDBMS that does not support CTEs.<p>I'm just wondering if anyone else can weigh in on the topic.
Author could lose a layer of subselect with the filter on rank by using first_value over the window frame instead. <a href="https://gist.github.com/dbenhur/951b7bae4e07d7998268" rel="nofollow">https://gist.github.com/dbenhur/951b7bae4e07d7998268</a>
I had some pretty great success rewriting some of our longer running queries to use arrays. A couple things:<p>I implemented several helper functions around this. Basically I have a row based function that makes sets unique then I have aggregation functions that give me the cardinality of these sets. I use intset when I can, otherwise I use the "key" part of hstore in substitute of the missing set structure<p>I've been pushing hard to use: <a href="https://github.com/aggregateknowledge/postgresql-hll" rel="nofollow">https://github.com/aggregateknowledge/postgresql-hll</a><p>Being able to switch algorithms based on size IMO makes that one of the most badass thing about postgres right now.