TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

An open-source framework for data-centric AI

4 点作者 gaussdiditfirst超过 2 年前

1 comment

gaussdiditfirst超过 2 年前
Senior data scientists know great ROI in real-world ML projects comes from finding&#x2F;fixing issues in the dataset rather than tinkering too much with models. But this is done manually today via ad hoc scripts (Jupyter notebooks). In data-centric AI, we also use software that can automatically detect data issues (mislabeled examples, outliers, etc) to make all this more systematic (better coverage, reproducibility, efficiency, etc). While some companies are starting to offer commercial platforms for data-centric AI, cleanlab is: fully open-source, a complete software framework that can be used for many data-types and ML tasks, and I&#x27;ve published all of the novel algorithms cleanlab uses to help you improve messy real-world ML datasets.<p>In one-line of python, cleanlab can automatically:<p>(1) find mislabeled data + train robust models (2) detect outliers (3) estimate consensus + annotator-quality for datasets labeled by multiple annotators (4) suggest which data is best to label or re-label next (active learning)<p>It has quick 5min tutorials for many types of data (image, text, tabular, audio, etc) and ML tasks (classification, entity recognition, image&#x2F;document tagging, etc).<p>Engineers used cleanlab at Google to clean and train robust models on speech data, at Amazon to estimate how often the Alexa device doesn’t wake, at Wells Fargo to train reliable financial prediction models, and at Microsoft, Tesla, Facebook, etc. Hopefully you&#x27;ll find cleanlab useful in your ML applications, it&#x27;s super easy to try out!
评论 #34243800 未加载