Interesting read! I have a service that was in SAS, and we've been translating it to run in Spark, but one of the killer issues that we identified was latency, without understanding what held up the computation, the would be increasing pauses of a few seconds, sometimes reaching nearly a minute, in execution. This is on a single machine, and at that time we wouldn't notice any resource utilisation. No disk writes, CPU nearly at 0.00, etc.<p>I keep coming back with every new Spark version to see if the problem has gone away, (wrote it at 2.0.0, so I mean every minor and patch). I looked up what I could online about optimisation in Spark, and applied that.<p>The business people got tired of us wasting time trying to optimise, and forced us down the lines of SAP HANA and other proprietary marketing hoohah because we need a product that's real-time.<p>I hope the upcoming version of Spark at least helps reduce latency, perhaps through improvements in the whole-stage code-gen.