Flink is pretty neat and it didn't feel like an "all or nothing" monolithic chunk when I played with it.<p>The page based inner loop makes it very predictable when it does data operations - they were doing this early last year and this mechanism made it very cpu cache friendly & didn't trigger the massive GC pauses.<p>I was playing with Flink a bit earlier, because Flink can be integrated into Tez, so that Tez could do DAG scheduling while Flink ran with its inner loops on turbo.<p>That inner loop can edge out even hand-written java code I wrote for page-rank (delta iterations are nice).
Great post detailing how Flink is managing data within the JVM and implementing internal operators (hashing, sorting, ..) working with that serialized data.
I also like the performance analysis in the post.