TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

From Pandas to Scikit-Learn – A new exciting workflow

2 pointsby TedPetrouover 6 years ago

1 comment

TedPetrouover 6 years ago
Scikit-Learn is making a huge upgrade with its upcoming version 0.20 release. If you used Pandas as your data exploration and preparation tool before turning to Scikit-Learn for machine learning, you were probably aware of the pain points of handling string columns and doing transformations to only a subset of the data.<p>This process is becoming much more robust and standardized thanks to the new ColumnTransformer which allows for applying transformations separately (in parallel) to different subsets of the data. It is built to accommodate Pandas DataFrames, so you can give it column names. The OneHotEncoder has been upgraded to handle string columns.<p>I am very excited about this release as handling string columns was easily the worst part of Scikit-Learn and there was no canonical way of going from a Pandas DataFrame to a Scikit-Learn estimator. I also cover KBinsDiscretizer which bins numeric columns and will replace Pandas cut and qcut functions in your workflows.<p>Appreciate any feedback on the article.
评论 #17915128 未加载