TechEcho

8 comments

Generally I'd say do the same as you would with your compiler when trying to optimize your code: spend time with the query analyzer.What the query optimizer will or won't do depends on your database software and your data, in addition to your query of course.For example, the article states "publish_date < now() - interval '3 days'" is always faster than "publish_date + interval '3 days' < now()".Well not for the database server we use, because it evaluates now() per row turning both into table scans... yay.In addition to the suggestions in the article, I'd say use common table expressions (CTE), or sub-query if your database server doesn't have CTEs or treats them as an optimization barrier.I've found the optimizer tries to be smart but doesn't recognize which of the joins will be the most specific filter, causing it to join tons of rows, just to throw most of them away with a later join. Using a CTE or sub-query can force the optimizers hand.Also I've often found it useful to use CTEs or sub-queries for returning row values, typically aggregates. This can especially work well if you're limiting the rows returned using TOP or LIMIT. Again this can avoid the DB throwing away work.

评论 #31289894 未加载

评论 #31293086 未加载

eatonphilabout 3 years ago

This is a good intro! It's good to have quick articles like this where you can easily read and pick up the basics.It's also pretty easy to build your own basic in-memory database if you have time/interest to show yourself these characteristics through experiment. And doing it experimentally ingrains the concepts better than just reading about it IMO.

评论 #31289787 未加载

charcircuitabout 3 years ago

I feel like the use of indexes is one of the least mysterious parts of databases performance.

评论 #31290605 未加载

评论 #31292629 未加载

评论 #31292505 未加载

leeoniyaabout 3 years ago

i remember being surprised by how slow GROUP BY year(date), month(date) was in MySQL, despite the date column being properly indexed.the solution was to store each of those in separate indexed columns instead of re-computing them at query time. think i set up an INSERT trigger to add these to a separate "index" table that i then joined to get the GROUP BY to be fast.

评论 #31289707 未加载

评论 #31291039 未加载

评论 #31295136 未加载

gigatexalabout 3 years ago

The query planner and optimizer etc are still black boxes. As a table’s size grows the way a query runs or the query plan could change. In some systems if you have stale statistics on tables you can get weird performance.The biggest hangup some devs see the DB like anything else and think lookups are O(1) when looking for a key. But btrees are more like O(log(n)).

评论 #31292444 未加载

spacemanmattabout 3 years ago

Good article. Story of my career: Accepting as well as teaching that a faster CPU won't make my database faster.

KronisLVabout 3 years ago

Did a bunch of migrations against two Oracle databases once for different environments of the same app. Same table structure, same indices, exact same SQL for migrations, pretty similar data (e.g. tables with ~40k and ~50k records in each of the environments).And yet, a SELECT query that worked great against one environment essentially made the query hang on the other, with lots of CPU usage. No locks to speak of. SELECT against individual tables working without issues, but whenever you had two particular tables within JOIN, it all broke down.Worked:<pre><code> SELECT ... FROM A WHERE ...; </code></pre> Worked:<pre><code> SELECT ... FROM A JOIN B WHERE ...; </code></pre> Worked:<pre><code> SELECT ... FROM A JOIN C WHERE ...; </code></pre> Stalled:<pre><code> SELECT ... FROM A JOIN B JOIN C WHERE ...; </code></pre> Tried re-creating the indices, did not help. Tried looking at the data, did not help. Killed all of the sessions and restarted the app, did not help. Inspected query plans, they were different - for the same query and the same indices. Made the DB re-generate the table statistics, that solved everything. It's unsettling that the process didn't happen in the background after the migration and that issues like this can eventually surface.There are some things that you just cannot easily demystify, because they don't make a lot of sense, given that everything had worked properly for years previously without this kind of an issue. Just know that depending on the RDBMS that you use, theory will be different from what you'll see in practice once you venture past the very basics.On a similar note, Oracle lets you generate indices automatically if you so choose, something that i discovered some time later and had hundreds of missing indices added for noticeable performance gains. Now, i hate the Oracle implementation of it (you cannot manually delete those indices, and the entire set of functionality is behind a paywall, much like the DB itself), but personally i think that dynamic index optimization might as well be something that should be present in PostgreSQL and MySQL/MariaDB as well.Perhaps not to the point of creating new indices, but just telling you that you might want to add certain ones: "If you add new index X by executing the code Y, then the performance should increase by Z% based on the analysis in the background that have been conducted over the past W months with Q queries analyzed."

评论 #31293121 未加载

评论 #31300405 未加载

wey-guabout 3 years ago

Great drawings! Thanks!

8 comments

magicalhippoabout 3 years ago

评论 #31289894 未加载

评论 #31293086 未加载

eatonphilabout 3 years ago

评论 #31289787 未加载

charcircuitabout 3 years ago

I feel like the use of indexes is one of the least mysterious parts of databases performance.

评论 #31290605 未加载

评论 #31292629 未加载

评论 #31292505 未加载

leeoniyaabout 3 years ago

评论 #31289707 未加载

评论 #31291039 未加载

评论 #31295136 未加载

gigatexalabout 3 years ago

评论 #31292444 未加载

spacemanmattabout 3 years ago

Good article. Story of my career: Accepting as well as teaching that a faster CPU won't make my database faster.

Demystifying Database Performance for Developers

8 comments

Demystifying Database Performance for Developers

8 comments