Very excited to hear the plans for GraphFrames - finally GraphX getting some attention!<p><a href="https://spark-summit.org/east-2016/events/graphframes-graph-queries-in-spark-sql/" rel="nofollow">https://spark-summit.org/east-2016/events/graphframes-graph-...</a>
How advanced is the Structured Streaming functionality? Looking at the JIRA [1] I cannot find even design prototype there, which is kind of strange if they want to have it ready by end of April. But as there was a presentation on the topic at the summit [2], I hope it's just developing it without discussion on JIRA.<p>[1] <a href="https://issues.apache.org/jira/browse/SPARK-8360" rel="nofollow">https://issues.apache.org/jira/browse/SPARK-8360</a>
[2] <a href="https://spark-summit.org/east-2016/events/keynote-day-3/" rel="nofollow">https://spark-summit.org/east-2016/events/keynote-day-3/</a>
I think the more exciting announcement was Databricks community edition, which allows you to use 2.0:<p><a href="https://news.ycombinator.com/item?id=11126179" rel="nofollow">https://news.ycombinator.com/item?id=11126179</a>
Slide 10:<p>> CPU speeds have not kept up with I/O in the past 5 years.<p>I presume he means the other way around?<p>Also, what does he mean by native memory management? Does he mean off-heap allocation?<p>And what's he referring to regarding code generation?
Has it become easier to run ad hoc queries with spark?
I remember a year ago that the only available solution was the job server by ooyala.
Which seems to be a missing feature of core Spark, and isn't something I was willing to bet my product on.<p>Datastax evangelized people to use Spark to run queries over Cassandra but it looks so awkward and time consuming to copy jars around to the master, basically you need a dev ops team to this and even more scriptology for production.
Are Spark streams ever going to reach a point where you can just have a table sitting in memory aggregating data and then you run queries on the <i>whole</i> thing without having to worry about windowing or anything?