The new Structured Streaming API looks pretty interesting. I have the impression that many Apache projects are trying to address the problems that arise with the lambda architecture. When implementing such a system, you have to worry about dealing with two separate systems, one for low-latency stream processing, and the other is the batch-style processing of large amounts of data.<p>Samza and Storm mostly focus on streaming, while Spark and MapReduce traditionally deal with batch. Spark leverages its core competency of dealing with batch data, and treats streams like mini-batches, effectively treating everything as batch.<p>And I imagine in the following snippet, the author is referring to Apache Flink, among other projects:<p>> One school of thought is to treat everything like a stream; that is, adopt a single programming model integrating both batch and streaming data.<p>My understanding of Structured Streaming also treats everything like batch, but can recognize that the code is being applied to a stream, and do some optimizations for low-latency processing. Is this what's going on?