I'll preface these comments by saying that Consul appears to be the first distributed cluster management tool i've seen in years that gets pretty much everything right (I can't tell exactly what their consistency guarantees are; I suppose it depends on the use case?).<p>What I will say, in my usually derisive fashion, is I can't tell why the majority of businesses would need decentralized network services like this. If you own your network, and you own all the resources in your network, and you control how they operate, I can't think of a good reason you would need services like this, other than a generalized want for dynamic scaling of a service provider (which doesn't really work without your application being designed for it, or an intermediary/backend application designed for it).<p>Load balancing an increase of requests by incrementally adding resources is what most people want when they say they want to scale. You don't need decentralized services to provide this. What do decentralized services provide, then? "Resilience". In the face of a random failure of a node or service, another one can take its place. Which is also accomplished with either network or application central load balancing. What you don't get [inherently] from decentralized services is load balancing; sending new requests to some poor additional peer simply swamps it. To distribute the load amongst all the available nodes, now you need a DHT or similar, and take a slight penalty from the efficiency of the algorithm's misses/hits.<p>All the features that tools like this provide - a replicated key/value store, health checks, auto discovery, network event triggers, service discovery, etc - can all be found in tools that work based on centralized services, while remaining scalable. I guess my point is, before you run off to your boss waving an iPad with Consul's website on it demanding to implement this new technology, try to see if you <i>need it</i>, or if you just think it's really cool.<p>It's also kind of scary that the ability of an entire network like Consul's to function depends on minimum numbers of nodes, quorums, leaders, etc. If you believe the claims that the distributed network is inherently more robust than a centralized one, you might not build it with fault-tolerant hardware or monitor them adequately, resulting in a wild goose chase where you try to determine if your app failures are due to the app server, the network, or one piece of hardware that the network is randomly hopping between. Could a bad switch port cause a leader to provide false consensus in the network? Could the writes on one node basically never propagate to its peers due to similar issues? How could you tell where the failure was if no health checks show red flags? And is there logging of the inconsistent data/states?