TechEcho

4 comments

bonesmosesabout 10 years ago

Introducing the caveats of PostgreSQL arrays is informative. However, I can't help but note that this app probably would have been better served by a classic event log table. Why have each user forever carrying tens, or hundreds of thousands of events directly in their user record?If data mining is necessary, it's right there in the event log and can be manipulated by much simpler (and faster, possibly by orders of magnitude) SQL. Additionally, it's not bloating regular access of the user records.I've been seeing these designs more and more, recently. Is it a natural extension of non-DBAs taking DDL design roles? Because this data manipulation style doesn't really mix well with any RDBMS I'm familiar with.

评论 #9329077 未加载

评论 #9328914 未加载

评论 #9328076 未加载

greggybabout 10 years ago

I find it very interesting to see how seemingly minor rewrites can have huge performance impacts.I do have a question about the SQL style. I work in business intelligence, so I write a pretty good amount of SQL, and get to see a lot of other people's efforts, ranging from newbie analysts' first fumblings to experienced DBAs' queries and procedures. It seems that nested subqueries in the style of this article are used much more often than CTEs, and I'm curious why this is.* I find CTEs much more readable than subqueries * For naive uses, they are optimized the same * For complex uses (e.g. recursion to traverse a parent-child hierarchy), CTEs are the only optionThe only good reason I see for subqueries is if you are using an old version of your RDBMS that does not support CTEs.I'm just wondering if anyone else can weigh in on the topic.

评论 #9329288 未加载

评论 #9329153 未加载

dbenhurabout 10 years ago

Author could lose a layer of subselect with the filter on rank by using first_value over the window frame instead. <a href="https://gist.github.com/dbenhur/951b7bae4e07d7998268" rel="nofollow">https://gist.github.com/dbenhur/951b7bae4e07d7998268</a>

评论 #9331133 未加载

mritsabout 10 years ago

I had some pretty great success rewriting some of our longer running queries to use arrays. A couple things:I implemented several helper functions around this. Basically I have a row based function that makes sets unique then I have aggregation functions that give me the cardinality of these sets. I use intset when I can, otherwise I use the "key" part of hstore in substitute of the missing set structureI've been pushing hard to use: <a href="https://github.com/aggregateknowledge/postgresql-hll" rel="nofollow">https://github.com/aggregateknowledge/postgresql-hll</a>Being able to switch algorithms based on size IMO makes that one of the most badass thing about postgres right now.

Using PostgreSQL Arrays the Right Way

4 comments

Using PostgreSQL Arrays the Right Way

4 comments