So the detailed post on datasets at <a href="https://databricks.com/blog/2016/01/04/introducing-spark-datasets.html" rel="nofollow">https://databricks.com/blog/2016/01/04/introducing-spark-dat...</a><p>uses groupBy<p>I'm pretty sure based on previous comments you've made that groupBy was one of the things you'd rather eliminate from the RDD api, because of the performance impact compared to reduceByKey (which is almost always what people should be using instead).<p>Are you at all worried about confusion if groupBy now performs ok on datasets, but not on rdds?