In this article by the creator of Airflow (https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a) it is mentioned that data should be partitioned by event processing time to land immutable blocks of data. How is this implemented?<p>For an example, if I have a system with events entering some stream (e.g. Kafka or Kinesis) and then periodically the data is written to storage (e.g. S3 or other) which are then batch processed on some schedule (e.g. airflow), then there are multiple 'time' values to consider.<p>t1 -> time of the event occurring
t2 -> time of the event entering the stream
t3 -> time of persisting batch of events to storage
t4 -> time of batch run (airflow) for further processing<p>What is considered the "event processing" time in this case? How is a partition generated so that at immutable block of data can be landed predictably? Presumably there must be some deterministic pattern for generating batch runs so that time partitions are immutable and so that backfill tasks can be generated.