Scikit-Learn is making a huge upgrade with its upcoming version 0.20 release. If you used Pandas as your data exploration and preparation tool before turning to Scikit-Learn for machine learning, you were probably aware of the pain points of handling string columns and doing transformations to only a subset of the data.<p>This process is becoming much more robust and standardized thanks to the new ColumnTransformer which allows for applying transformations separately (in parallel) to different subsets of the data. It is built to accommodate Pandas DataFrames, so you can give it column names. The OneHotEncoder has been upgraded to handle string columns.<p>I am very excited about this release as handling string columns was easily the worst part of Scikit-Learn and there was no canonical way of going from a Pandas DataFrame to a Scikit-Learn estimator. I also cover KBinsDiscretizer which bins numeric columns and will replace Pandas cut and qcut functions in your workflows.<p>Appreciate any feedback on the article.