One tool to add to your toolbox as a developer is named locks. Say you have a table for orders and a table for order items and a table for payments. Collectively they represent an order, but that is a higher level concept that isn't something an RDBMS understands. When the user on your site simultaneously sends two updates, such as "remove item from the cart" and "pay and finish", you will have a slew of simultaneous updates to the rows across the three tables. Unless you use the highest isolation level, you will get inconsistencies. If you do use serializable isolation level you might get a deadlock, depending on how conscientious you are about updating everything in the same order. You could do an initial read from the orders table with FOR UPDATE which will provide you an implicit lock on that order, but you still have issues with new rows being inserted into your order items table that will not be range locked.<p>So use names/advisory locks: create a named lock called "order-1234" where 1234 is tbe ID of the order. Acquire it before doing any manipulations to the order across all its tables. Then release it. Semantics of names locks differ across RDBMS's. Unsurprisingly Postgres has saner and more flexible semantics than MySQL, but both are perfectly suitable for the purpose. This technique can save you a ton of headaches.
Weird to see hand-drawn sideways semi message sequence charts. (Looks vaguely reminiscent of musical notation!)<p>For those yet to discover it, a great tool is <a href="http://www.mcternan.me.uk/mscgen/" rel="nofollow">http://www.mcternan.me.uk/mscgen/</a> ... obligatory JS equivalent <a href="https://mscgen.js.org/" rel="nofollow">https://mscgen.js.org/</a>
This article mostly addresses transaction isolation from the correctness point of view, rather than the tradeoff between semantics and performance.<p>Higher isolation levels add read locks that can block concurrent transactions. That's the biggest practical reason everyone doesn't use serializable, but this practical guide doesn't really address it. There's not much discussion of the gotchas whereby you can accidentally break concurrency in an application.<p>For example there are somewhat surprising sources of contention, e.g. adding rows with foreign keys adds locks to the rows being referenced in foreign table, to stop them getting deleted. Or consider gap locks in queries with range predicates.<p>IMO the guide is mostly theoretical / introductory rather than practical. It's an OK starting point for someone who knows almost nothing about databases that are being used as shared state for a concurrent application.
This is a great article. It explains the various concurrency issues in fairly plain English, which is great! I wish more people were aware of these isolation levels and their consequences. Many application developers will either a) assume the database will always figure it out for them (in other words, they assume it is always serializable, with no errors) or b) hold great fear of the database and it's mystical use of locks - this type of developer tends to avoid running queries at all cost, lest they wake the beast!
I was an Informix DBA years ago and the other day an old friend who now works with a company supporting game developers told me that Informix is (still) really popular principally due to the way it handles locks and multi-concurrency. Surprised to say the least.
I'd also like to thank the other for his attempt at raising the general awareness of transaction isolation. Even when you deal with simpler databases like SQLite, isolation is a serious topic. When serializing accesses hinders performances, dealing with the "WAL mode" that grants SQLite with single-writer-multipler-readers abilities is everything but trivial, because of the huge amount of documentation that has to be read in order to reach the desired level of isolation. I've written an SQLite library in Swift that provides <i>by default</i> the snapshot isolation level which grants most developers with both peace of mind, and non-blocking reads. Check out its "concurrency guide" if you're interested: <a href="https://github.com/groue/GRDB.swift/blob/master/README.md#concurrency" rel="nofollow">https://github.com/groue/GRDB.swift/blob/master/README.md#co...</a>
I'm not sure I get the Dirty Writes example about not always being able to rollback: it first says going back to state A would lose w2[x], so it stays in state C. Then it says that aborting t2 can't go back to states B or C, concluding that the scenario thus can't be rollbacked. But what about going back to state A in case of a1+a2?
If you enjoy this post, you'll <i>love</i> Martin Kleppmann's book Designing Data-Intensive Applications. Chapter 7 covers exactly this topic in more depth.<p><a href="http://dataintensive.net/" rel="nofollow">http://dataintensive.net/</a><p><a href="https://www.safaribooksonline.com/library/view/designing-data-intensive-applications/9781491903063/" rel="nofollow">https://www.safaribooksonline.com/library/view/designing-dat...</a>
Very interesting although I'm not sure about the last point where the author assert that transaction errors can't be managed inside a pl/pgsql function.<p>It's a little bit twisted but I think you could write a pl/pgsql function that act as a client of it's own database using fdw or dblink.<p>This is however pretty twisted and might have side effect but I'm pretty curious about that.<p>Technically the author is still damn right because this will be using a client library at some point as he explain.