Oh, it took us around 2 years to have somewhat reliable event dispatching with transactional outboxes. We've run into so many edge cases under highload:<p>- using the autoincrement ID as a pointer to the "last processed event" not knowing that MySQL reserves IDs non-atomically with inserts so it's possible for the relay to "see" an event with a bigger ID before an event with a smaller ID becomes visible, thus skipping some events<p>- implementing event handlers with "only once" semantics instead of "at least once" (retries on error corrupting data)<p>- event queue processing completely halting for the entire application when an event handler throws an error due to a bug and so gets retried infinitely<p>- some other race conditions in the implementation when the order of events gets messed up<p>- too fine-grained events without batching overflowing the queue (takes too much time for an event to get processed)<p>- the relay getting OOMs due to excessive batching<p>- once we had a funny bug when code which updated the last processed ID of the current DB shard (each client has their own shard) wrote to the wrong shards and so our relay started replaying thousands events from years ago<p>- some event handlers always sending mail as part of processing events, so when it's retried on error or replayed (see the bug above) clients receive same emails multiple times<p>And still we have sometimes weird bugs like once a month for some reason we see a random event getting replayed in a deleted account, still tracking it down.