TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Continuous Machine Learning – CI/CD for Machine Learning Projects

171 点作者 rhythmvertigo将近 5 年前

9 条评论

rhythmvertigo将近 5 年前
Hi, I&#x27;m one of the project creators. Continuous Machine Learning (CML) is an open source project to help ML projects use CI&#x2F;CD with Github Actions and Gitlab CI (<a href="https:&#x2F;&#x2F;github.com&#x2F;iterative&#x2F;cml" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;iterative&#x2F;cml</a>).<p>CML automatically generates human-readable reports with metrics and data viz in every pull&#x2F;merge request, and helps you use storage and GPU&#x2F;CPU resources from cloud services. CML addresses three hurdles for making ML compatible with CI:<p>1. In ML, pass&#x2F;fail tests aren’t enough. Understanding model performance might require data visualizations and detailed metric reports. CML automatically generates custom reports after every CI run with visual elements like tables and graphs. You can even get a Tensorboard.dev link as part of your report.<p>2. Dataset changes need to trigger feedback just like source code. CML works with DVC so dataset changes trigger automatic training and testing.<p>3.Hardware for ML is an ecosystem in itself. We’ve developed use cases with CML and Docker Machine to automatically provision and deploy cloud compute instances (CPU &amp; GPU) for model training.<p>Our philosophy is that ML projects- and MLOps practices- should be built on top of traditional software tools and CI systems, and not as a separate platform. Our goal is to extend DevOps’ wins from software development to ML. Check out our project site (<a href="https:&#x2F;&#x2F;cml.dev" rel="nofollow">https:&#x2F;&#x2F;cml.dev</a>) and repo, and please let us know what you think!
评论 #23765016 未加载
评论 #23759481 未加载
评论 #23763527 未加载
ishcheklein将近 5 年前
Hey! Disclaimer - I&#x27;m one of the DVC maintainers :) Super excited for the team on this release!<p>For the last two years we have seen over and over again how our users take DVC and use it inside Gitlab, Github, etc. This product was born partially as a result of these discussions, partially as an initial visions for the ML tools ecosystem - Hashicorp-like.<p>Having A software engineering background I really hope that integrating ML workflow into engineering tools will be the future of this space. And with CML and other tools (e.g. <a href="https:&#x2F;&#x2F;github.blog&#x2F;2020-06-17-using-github-actions-for-mlops-data-science&#x2F;" rel="nofollow">https:&#x2F;&#x2F;github.blog&#x2F;2020-06-17-using-github-actions-for-mlop...</a>) we see this happening.
评论 #23761726 未加载
toisanji将近 5 年前
Very interesting, I&#x27;ve been looking for something like this to add to our ML pipeline. a few questions:<p>1) can we see examples of generated reports?<p>2) what happens if training fails?<p>3) what kind of metrics can it graph? can we have it track our custom metrics?<p>4) can we connect with external services like with webhooks,slack, and other integrations.<p>5) is this a docker technology, or how does it deal with images and dependencies?<p>Great work!
评论 #23760042 未加载
bigfoot675将近 5 年前
I think taking the approach of &quot;help[ing] your team make informed, data-driven decisions&quot; through generating reports is valuable here. In my opinion, it goes too far if we start continuously deploying ML code like it&#x27;s a SWE project. To take an example in the case of autonomous vehicles, pushing continuous updates to perception modules without thoroughly exploring the ramifications of an update could be potentially catastrophic.<p>Obviously we can&#x27;t predict every error by thinking hard, but datasets will never serve as a full representation of what models might experience in the real world. Continuous deployment to an ML model could affect undefined behavior in unpredictable ways.
评论 #23762236 未加载
评论 #23762247 未加载
tknaup将近 5 年前
ML is a relatively young field, and decades behind Software Engineering in terms of best practices for running production systems. CI&#x2F;CD massively improved the innovation cycle time and quality of production software, and I believe it is key for building robust production ML systems as well. CML looks like a really easy to use product for bringing CI&#x2F;CD to ML projects.
评论 #23760562 未加载
评论 #23760464 未加载
tnachen将近 5 年前
Awesome to see a github native workflow for CI&#x2F;CD in the ML space! This team is closet I seen that&#x27;s like Hashicorp for ML
评论 #23760423 未加载
ayanb9440将近 5 年前
Very exciting! I used this team&#x27;s previous product (DVC) at a research lab at Caltech. This looks like a very useful tool.
评论 #23761596 未加载
rkaplan将近 5 年前
Longtime DVC user here - this is going to be so helpful. We use DVC for all of our model and data versioning, but what&#x27;s been missing is the ability to cleanly integrate that into our CI workflow. Looks like that&#x27;s solved now! The cml.yaml syntax also looks quite nice, very easy to follow. Looking forward to trying this out.
评论 #23762241 未加载
m0sth8将近 5 年前
CML looks really awesome. I&#x27;ve been on one of your online meetups. Are you planning to host more in the future? It would be great to learn real production use cases!
评论 #23762992 未加载
评论 #23762269 未加载