TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

From Pandas to Scikit-Learn – A new exciting workflow

2 点作者 TedPetrou超过 6 年前

1 comment

TedPetrou超过 6 年前
Scikit-Learn is making a huge upgrade with its upcoming version 0.20 release. If you used Pandas as your data exploration and preparation tool before turning to Scikit-Learn for machine learning, you were probably aware of the pain points of handling string columns and doing transformations to only a subset of the data.<p>This process is becoming much more robust and standardized thanks to the new ColumnTransformer which allows for applying transformations separately (in parallel) to different subsets of the data. It is built to accommodate Pandas DataFrames, so you can give it column names. The OneHotEncoder has been upgraded to handle string columns.<p>I am very excited about this release as handling string columns was easily the worst part of Scikit-Learn and there was no canonical way of going from a Pandas DataFrame to a Scikit-Learn estimator. I also cover KBinsDiscretizer which bins numeric columns and will replace Pandas cut and qcut functions in your workflows.<p>Appreciate any feedback on the article.
评论 #17915128 未加载