Consistency is Consistently Undervalued

220 pointsby kpmahover 8 years ago

20 comments

vidarhover 8 years ago

My opinion is exactly opposite: Consistently is overvalued.Requiring consistency in distributed system generally leads to designs that reduces availability.Which is one of the reasons that bank transactions generally do not rely on transactional updates against your bank. "Low level" operations as part of settlement may us transactions, but the bank system is "designed" (more like it has grown by accretion) to function almost entirely by settlement and reconciliation rather than holding onto any notion of consistency.The real world rarely involves having a consistent view of anything. We often design software with consistency guarantees that are pointless because the guarantees can only hold until the data has been output, and are often obsolete before the user has even seen it.That's not to say that there are no places where consistency matters, but often it matters because of thoughtless designs elsewhere that ends up demanding unnecessary locks and killing throughput, failing if connectivity to some canonical data store happens to be unavailable etc.The places where we can't design systems to function without consistently guarantees are few and far between.

评论 #12520425 未加载

评论 #12520463 未加载

评论 #12520595 未加载

评论 #12522054 未加载

评论 #12520014 未加载

评论 #12521641 未加载

评论 #12521059 未加载

评论 #12520489 未加载

评论 #12522041 未加载

评论 #12520102 未加载

评论 #12521099 未加载

评论 #12520124 未加载

brandurover 8 years ago

Amen. Whether or not the article's example is a good one, in a world without consistency you need to worry about state between _any_ two database operations in the system, so there's nearly unlimited opportunity for this class of error in almost any application found in the real world.The truly nefarious aspect of NoSQL stores is that the problems that arise from giving up ACID often aren't obvious until your new product is actually in production and failures that you didn't plan for start to appear.Once you're running a NoSQL system of considerable size, you're going to have a sizable number of engineers who are spending significant amounts of their time thinking about and repairing data integrity problems that arise from even minor failures that are happening every single day. There is really no general fix for this; it's going to be a persistent operational tax that stays with your company as long as the NoSQL store does.The same isn't true for an ACID database. You may eventually run into scaling bottle necks (although not nearly as soon as most people think), transactions are darn close to magic in how much default resilience they give to your system. If an unexpected failure occurs, you can roll back the transaction that you're running in, and in almost 100% of cases this turns out to be a "good enough" solution, leaving your application state sane and data integrity sound.In the long run, ACID databases pay dividends in allowing an engineering team to stay focused on building new features instead of getting lost in the weeds of never ending daily operational work. NoSQL stores on the other hand are more akin to an unpaid credit card bill, with unpaid interest continuing to compound month after month.

评论 #12520036 未加载

评论 #12520083 未加载

评论 #12520664 未加载

olalondeover 8 years ago

And in case you think there's a general solution to that problem, there isn't: <a href="https://en.wikipedia.org/wiki/CAP_theorem" rel="nofollow">https://en.wikipedia.org/wiki/CAP_theorem</a>Still, it's funny how banking seems to be the canonical example for why we need transactions given that most banking transactions are inconsistent (<a href="http://highscalability.com/blog/2013/5/1/myth-eric-brewer-on-why-banks-are-base-not-acid-availability.html" rel="nofollow">http://highscalability.com/blog/2013/5/1/myth-eric-brewer-on...</a>).

评论 #12519804 未加载

评论 #12519742 未加载

marknadalover 8 years ago

Disclaimer: I work on distributed systems and have spoken around the world on them, so I am very biased.I think something a lot of people miss is that the universe itself is not strongly consistent. This is Einstein's theory of relativity. It fundamentally takes time for information to travel.So if strong consistency is not viable, even at a physics level, what can we do instead? That is why our team here at <a href="http://gun.js.org/" rel="nofollow">http://gun.js.org/</a> believe in CRDTs. What are CRDTs? They are data structures that are mathematically proven to produce the same results on retries - so even if the power goes out, or the network fails, you can safely re-attempt the update.This means you fundamentally don't need "transactions" or any of these jargon words that people often throw out. Sadly CRDTs are becoming another one of those jargons, despite how simple they are in reality.

评论 #12521474 未加载

xantonover 8 years ago

Transactions can have different isolation levels. And sometimes the problem at hand can be implemented using transactions with weak isolation levels which are not that hard to implement using your favorite NoSQL database that support CAS operation. I recommend this article: <a href="http://rystsov.info/2012/09/01/cas.html" rel="nofollow">http://rystsov.info/2012/09/01/cas.html</a>

评论 #12519896 未加载

mettamageover 8 years ago

Before I read this article, the following question popped into my mind and it miiiiiight be tangentially related -- yea probably not, blame the title ;) When taking the concept of consistency, does consistency have an effect that is akin to compound interest?For example, imagine someone doing the same thing year after year diligently. (S)he'd increase his or her skill say 10% a year (have no clue what realistic numbers are). Would that mean that the compound interest effect would occur?I phrased it really naievely, because while the answer is "yes" in those circumstances (1.1 ^ n). I'm overlooking a lot and have no clue what I overlook.I know it's off-topic, it's what I thought when I read the title and I never thought about it before, so I'm a bit too curious at the moment ;)

toolsliveover 8 years ago

The problem is that when the system does not guarantee consistency, you force the application developer using the system to solve that problem. Each application developed, will have to solve the same problem. Besides the fact the same effort is done over and over again, you also are forcing application developers to solve a problem for which they probably do not have the right skill set. In short, that strategy is wasteful (replicating work) and risky (they'll make mistakes)

acjohnson55over 8 years ago

I sort of agree. The examples in the article are ways in which people play fast and loose with consistency, often using a NoSQL store that has poor support for atomicity and isolation. This is a helpful message, because I've definitely seen naively designed systems start to develop all sorts of corruption when used at scale. The answer for many low-throughput applications is to just use Postgres. Both Django and Rails, by default, work with relational databases and leverage transactions for consistency.Then, there is the rise of microservices to consider. In this case, I also agree with the author that it becomes crucial to understand that the number of states your data model can be in can potentially multiply, since transactional updates are very difficult to do.But I feel like on the opposite side of the spectrum of sophistication are people working on well-engineered eventually consistent data systems, with techniques like event sourcing, and a strong understanding of the hazards. There's a compelling argument that this more closely models the real world and unlocks scalability potential that is difficult or impossible to match with a fully consistent, ACID-compliant database.Interestingly, in a recent project, I decided to layer in strict consistency on top of event sourcing underpinnings (Akka Persistence). My project has low write volume, but also no tolerance for the latency of a write conflict resolution strategy. That resulted in a library called Atomic Store [1].[1] <a href="https://github.com/artsy/atomic-store" rel="nofollow">https://github.com/artsy/atomic-store</a>

calindover 8 years ago

I think it's a bad example because this should not be the way to develop in this kind (microservices) of systems.In these environments you atomically create objects in your application's "local" storage and have a reconciliation loop for creating objects in other services or deleting these orphan "local" objects.

评论 #12519878 未加载

muteorover 8 years ago

If anyone is interested in this sort of thing, I found this a great article: <a href="http://www.grahamlea.com/2016/08/distributed-transactions-microservices-icebergs/" rel="nofollow">http://www.grahamlea.com/2016/08/distributed-transactions-mi...</a>

chajathover 8 years ago

To actually do a distributed transaction, I would look into algorithms such as 2PC or 3PC. Although before going there, I would seriously consider consolidating different backends into one scalable option (banking transaction example is a bit contrived, although I gather the author is just trying to make a point).At the service level integration, we can leverage a reliable message queue middleware to make sure a task is eventually delivered and handled (or be put in a dead letter queue so we can do a batch clean up)Also as a general principle, I would make each of those sub-transactions to be idempotent, so that retrying multiple times won't hurt, and there would be a natural way of picking the winner if there are conflicting ongoing commit / retry attempts.

fagnerbrackover 8 years ago

Why isn't the OP using Event Sourcing "commands" for the "Bank Accounts" example?

评论 #12519887 未加载

评论 #12519883 未加载

ah-over 8 years ago

You can use event logs and eventual consistency to solve this problem.Basically you make the transfer of money an event that is then atomically committed to an event log. The two bank accounts then eventually incorporate that state.See <a href="http://www.grahamlea.com/2016/08/distributed-transactions-microservices-icebergs/" rel="nofollow">http://www.grahamlea.com/2016/08/distributed-transactions-mi...</a>But I agree that often life is easier if you just keep things simpler. If you require strong consistency like with the user/profile don't make that state distributed. If you do make it distributed you need to live with less consistency.

评论 #12520619 未加载

phamiltonover 8 years ago

This profile example is missing the better approach: avoid the dependency of creating the user before creating the profile.Create the profile with a generated uuid. Once that succeeds, then create the user with the same uuid.If you build a system that allows orphaned profiles (by just ignoring them) then you avoid the need to deal with potentially missing profiles.This is essentially implementing MVCC. Write all your data with a new version and then as a final step write to a ledger declaring the new version to be valid. In this case, creating the user is writing to that ledger.

评论 #12521115 未加载

morgoover 8 years ago

Good article.I've stopped using bank transfers as an example for Acid transactions, and instead talk about social features:- if I change a privacy setting in Facebook or remove access to a user, these changes should be atomic and durable- transactions offer a good semantic of which to make these changes. They can be staged in queries, but nothing is successful until after a commit.- without transactions durability is hard to offer. You would essentially need to make each query flush to disk, rather tha each transaction. Much more expensive.

xarienover 8 years ago

Depends on your POV. Startups undervalue it and corporations overvalue it. At the end of the day, it's just risk management.

fagnerbrackover 8 years ago

In case anyone is wondering what an "atomic change" means in database terminology: <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Atomic-Changes.html" rel="nofollow">https://www.gnu.org/software/emacs/manual/html_node/elisp/At...</a>

matt_wulfeckover 8 years ago

Maybe I'm crazy, but I never see atomic libraries that are called like this:<pre><code> bank_account2.deposit(amount) bank_account1.deposit(amount) </code></pre> Isn't this kind of thing always called in some atomic batch operation?<pre><code> transact.batch([ account[a] = -8, account[b] = 8 ]).submit()</code></pre>

评论 #12521321 未加载

sz4kertoover 8 years ago

The universally hated JEE can do distributed transactions by default. Yes, with pitfalls, but it can. (It is usually hated by devs who have never used it properly.)

评论 #12520544 未加载

bullenover 8 years ago

But consider this:You are using mysql, you make a transaction with say deposit and withdraw.What happens on the mysql machine if you pull the plug exactly when mysql has done the deposit but not the withdraw?The ONLY difference between SQL transactions and NoSQL microservice transactions is the time between the parts of a transaction.Personally I use a JSON file with state to execute my NoSQL microservice transactions and it's alot more scalable than having a pre-internet era SQL legacy design hogging all my time and resources.

评论 #12519836 未加载

评论 #12519784 未加载

评论 #12519779 未加载

评论 #12520093 未加载

评论 #12519782 未加载

20 comments

vidarhover 8 years ago

评论 #12520425 未加载

评论 #12520463 未加载

评论 #12520595 未加载

评论 #12522054 未加载

评论 #12520014 未加载

评论 #12521641 未加载

评论 #12521059 未加载

评论 #12520489 未加载

评论 #12522041 未加载

评论 #12520102 未加载

评论 #12521099 未加载

评论 #12520124 未加载

brandurover 8 years ago

评论 #12520036 未加载

评论 #12520083 未加载

评论 #12520664 未加载

olalondeover 8 years ago

评论 #12519804 未加载

评论 #12519742 未加载

marknadalover 8 years ago

评论 #12521474 未加载

xantonover 8 years ago

评论 #12519896 未加载

mettamageover 8 years ago

toolsliveover 8 years ago

acjohnson55over 8 years ago

calindover 8 years ago

评论 #12519878 未加载

muteorover 8 years ago

chajathover 8 years ago

fagnerbrackover 8 years ago

Why isn't the OP using Event Sourcing "commands" for the "Bank Accounts" example?

评论 #12519887 未加载

评论 #12519883 未加载

ah-over 8 years ago

评论 #12520619 未加载

phamiltonover 8 years ago

评论 #12521115 未加载

morgoover 8 years ago

xarienover 8 years ago

Depends on your POV. Startups undervalue it and corporations overvalue it. At the end of the day, it's just risk management.

fagnerbrackover 8 years ago

matt_wulfeckover 8 years ago

评论 #12521321 未加载

sz4kertoover 8 years ago

The universally hated JEE can do distributed transactions by default. Yes, with pitfalls, but it can. (It is usually hated by devs who have never used it properly.)