Kafka is really wonderful in its simplicity. You put messages into it and one or more consumers read those messages in partition sequence. My app only does around 1M messages/minute, but LinkedIn does 13M per second. Granted, LinkedIn's usage is across all of their services, but Kafka's log structure offers great performance and the replication offers durability.<p>If you're looking to process data streams in real time, Kafka is definitely worth a look and the team at Confluent is awesome.
It's indeed a good idea to measure delta and lag along the workflow.
It helps both to be alerted of an eventual issue and to identify the possible spots and causes : a trafic spike, a stage without enough cpu/io resources, a late consumer among a group ...