If you are interested in the code behind this, I wrote an overview last month on the functionality and links to the different code that backs the improvements they talk about: <a href="http://hydronitrogen.com/spark-220-cost-based-optimizer-explained.html" rel="nofollow">http://hydronitrogen.com/spark-220-cost-based-optimizer-expl...</a><p>There's a fair amount of overlap, but where the databricks article explains the techniques with charts and high level explanations, I go over the code instead.