A CS major in college here. This article suggests that centralized architectures seem to be winning and that Google has learnt that with experience. I have a question and people on HN might be the right people to ask this to.<p>Why is it that you need experience to learn whether a centralized model is better or a decentralized one? When a a team is considering alternative models (centralized vs distributed), can't you just compute the parameters of each model (complexity, network usage, availability, reliability etc) and pick the one that is better for your objective? Why do they need to trial and experiment?<p>I come from a college world where we are learning algorithms and systems and we can easily pick an algorithm for a problem based on complexity etc without having to implement the multiple algorithms we are considering. I find that industry is a lot more trial and error. Why is this? An explanation with an example would be great.
Disclaimer: I'm a CS PhD student who works on distributed service architectures.<p>It's worth pointing out that this design pattern only makes sense when the entire system lives under one administrative domain. Google owns all of the servers that make up GoogleFS; a cloud provider owns all of the Hadoop nodes in its datacenters; a PaaS provider owns all of its NoSQL datastore nodes; etc. We see a similar pattern at work in Puppet, Chef, Ansible, Func, certmanager, etc. as well.<p>Under these circumstances, it's desirable to maintain the authoritative state in a logically centralized place for two reasons. First, doing so makes it easy for the rest of the system to discover and query it. Second, it makes it easier to keep authoritative state consistent with updates. Centralizing control and distributing data lets you address control-plane concerns separately and independently of data-plane concerns.<p>However, it stops making sense to centralize the authoritative state (control) once you build a system that spans multiple administrative domains. Which domain gets to host the authoritative state? How do you get the other domains to act on it? Centralization won't work here, unless you can first get the domains to agree on who's the controller (sacrificing their autonomy to decide the state of the system).<p>We have addressed these concerns instead by distributing responsibility for the authoritative state across domains, and devising a way for them to reach consensus on it. DNS does this by delegating authority for name bindings hierarchically. The Internet maintains routing state by having each AS learn and advertise routes to each other AS via BGP. Bitcoin maintains the blockchain (its authoritative state) by having a majority of nodes agree on the sequence of blocks added to it. DHTs work by sharding the key space AND routing state across their participants.<p>It's hard to achieve consensus (and react to changes) in these multi-domain settings versus the single-domain setting since you can't force every domain's replicas to agree. However, this is a <i>feature</i>--no one but the computer's owner should have the final say on the state it hosts. Naturally, multi-domain systems must account for this in their design--something that Google's internal systems can safely ignore.
decentralization is hard.<p>To put in different terms its like trying to control a plate of marbles with a single pencil.<p>you can only manipulate a small portion of of the marbles, and you hope that the commands you give them will propagate and not run out of control.<p>Nowadays its perfectly feasible to control 10,000 servers through one system running on two or three servers. with some work that could reasonably be pushed to half a million.<p>That'll basically take of 99.99% of all companies needs.