TechEcho

6 comments

mcguirealmost 11 years ago

Yep, that's been a problem for decades, not just in distributed systems but perhaps especially in distributed systems. I'm practically overjoyed to see things like "Formal methods at Amazon Web Services", because it means at least some kind of bridge between the two exists.My suspicion, outside of johnparkerg's polarization, is that distributed systems in practice are particularly messy from a formal standpoint, in contrast to non-distributed systems, while in practice you can live with the messiness, if you can reduce it a sufficient number of 9's, which seems to be anathema to theoretical approaches.For example, the proof of the CAP theorem is irrelevant in practice, specifically because no system I'm aware of makes the strong consistency assumptions that it requires. On the other hand, CAP behavior is definitely a problem, once you reach a certain size.

评论 #8160301 未加载

chubotalmost 11 years ago

If FLP says consensus is impossible with one faulty process, and faults happen all the time in practice, how are real systems built with consensus?Good question. I think the answer is that you don't build large systems using consensus; you bootstrap large systems with very small systems using consensus. The very small systems are reasonably assumed to have no Byzantine faults (i.e. just like you more or less rely on a single database server not to have faults).All the systems I know of that use consensus are meant to be small, e.g. run on 5 machines or so (and of course the membership is fixed). Google's Chubby, Yahoo's ZooKeeper, and similar systems like doozer and etcd all work like this.Consensus doesn't "scale" anyway (the latency isn't bearable). If you only have 5 machines, the likelihood of Byzantine faults over a reasonably long time period is low. The main problem you will see is your own software bugs (i.e. not bugs due to faulty CPU, memory, disks, switches, etc.).

评论 #8160616 未加载

johnparkergalmost 11 years ago

I think this is just a reflection of the gap between academia and the real world. One could argue that the problem is the lack of a 'trial and error' philosophy in 'theory and practice' or that scientists couldn't possibly develop so much if they were bothered with the real world.I believe the true problem lies in our need to categorize people as either scientists and theoretical engineers or down-to-earth engineers, which only polarizes the spectrum.

vicayaalmost 11 years ago

FLP/CAP merely illustrates the fundamental trade-off between "safety/consistency" and "liveliness/availability" of distributed systems.There is a spectrum of design choices between strong consistency with best effort availability and best effort everything that make sense to many practical use cases.

erikbalmost 11 years ago

I think the middle ground between theory and practice has gaps in most areas of expertise. The problem may be that to create good content in those gaps one needs to have accomplished a certain level of skill in both areas, theory and practise. Very few people get around to actually step aside from their daily work (which usually is pure theory or pure practise) to achieve something on the other end, though.

apialmost 11 years ago

I would add a ninth fallacy to the eight: CPU is free. This is effectively true on desktops and possibly on servers if task priorities are used, but on mobile devices including laptops CPU eats battery life.

6 comments

mcguirealmost 11 years ago

评论 #8160301 未加载

chubotalmost 11 years ago

评论 #8160616 未加载

johnparkergalmost 11 years ago

vicayaalmost 11 years ago

erikbalmost 11 years ago

apialmost 11 years ago

The space between theory and practice in distributed systems

6 comments

The space between theory and practice in distributed systems

6 comments