i was trying out serf, but one issue that killed it for orchestration is behavior around network partitions or transient net issues, namely messages get dropped on the floor. which would need a layer on top to query out distributed eventually consistent state for a node rejoin to replay orchestration. using etcd (or even zk) is considerably simpler to reason about failure recovery for transient issues.