Interesting idea moving the reception of messages to other nodes and getting back smaller set of messages. Not only that but distributing the reception responsibility into relay groups where it can be load balanced among peers.<p>The leader using relay nodes reminds me of how humans organize into boss and worker groups and the boss shouts out orders to group leaders.<p>I would like to see how dynamic relay groups will perform under stress and I wonder who would communicate new relay groups. Choice of leader imposing relay group structure or groups self organizing.
[edit: comparative performance to epaxos is noted in OP - a x10 improvement is claimed.]<p>Egalitarian Paxos (2013):
<a href="https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf" rel="nofollow">https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf</a><p>code: <a href="https://github.com/efficient/epaxos" rel="nofollow">https://github.com/efficient/epaxos</a>
Great idea and great work!<p>A couple nitpicks: it would be nice to see what happens when the leader fails. Optimizing for the case of a stable leader might have impact on recovery time.<p>Another important aspect for fault-tolerance is whether you can really survive any minority crashing. For example, if only the strictly necessary number of nodes keep up with the leader, then if most of those crash the system will have a really hard time recovering due to the backlog accumulated at slow nodes which now need to catch up for the system to continue operating.<p>A performance number that does not take those things into account may not be very realistic. Nevertheless the idea is pretty good.
Is this a similar concept to compartmentalized consensus? <a href="https://mwhittaker.github.io/publications/compartmentalized_consensus.pdf" rel="nofollow">https://mwhittaker.github.io/publications/compartmentalized_...</a>
Interesting that there is another Paxos variant that was published to arxiv around february called BipartisanPaxos <a href="https://mwhittaker.github.io/publications/bipartisan_paxos.pdf" rel="nofollow">https://mwhittaker.github.io/publications/bipartisan_paxos.p...</a>. It too aims at removing bottlenecks.
You can achieve constant write throughput and read latency and linearly scalable read throughput (at the cost of write latency) with LCR, a criminally unknown uniform total order broadcast protocol.