Since I maintain a pretty large ETL (batch) application for a living, I am genuinely curious about this. How do you handle failure in event-processing systems? I mean in batch, it's simple - if there is a record (event) that causes unexpected failure (or the program fails for other reason, for example it runs out of space), we just restart the batch.<p>But in event processing, unless you can afford yourself to skip events, how do you deal with that sort of thing, especially if the processing needs to keep track of internal state between events?<p>I read about event-sourcing, which kinda is a solution to that, but add checkpoints and you have pretty much batch processing again.