Hey HN, creator of DVC here!<p>DVC (https://dvc.org/) is known as Git for data projects. Technically, DVC codifies your data and machine learning pipelines as text metafiles (with pointers to actual data in S3/GCP/Azure/SSH) while you use Git for the actual versioning. DevOps folks call this approach GitOps or more specifically in this case - DataOps or MLOps.<p>We’ve been working towards 1.0 since we started 3 years ago. What began as my pet project now has 100+ code contributors, 100+ documentation contributors, and thousands of users.<p>Our community has taught us a lot - here are some of the biggest lessons:<p>1. Users say the serverless and distributed nature of DVC (inherited from the underlying Git) is one of its "killer features".<p>2. To share ML projects within and between teams, it’s not enough to track only files and pipelines. You also need metrics, plot and hyperparameter tracking. In DVC 1.0 we implemented hyper-parameter diffs, metrics and plot diffs right from Git history.<p>3. In DataOps, data transfer optimization is huge. Large deep learning models, millions of images in datasets, etc. We doubled down on optimizing 1.0.<p>4. ML pipelines evolve faster than data engineering pipelines and need to be easy to change. In 1.0, we’ve simplified the pipeline metafile format.<p>5. More and more teams use DVC as a part of CI/CD for ML and other MLOps tools. DVC is used under the hood in the CD4ML tool that was described in the canonical post on Martin Fowler’s blog: https://martinfowler.com/articles/cd4ml.html. We built 1.0 with CI/CD users in mind.<p>More details on https://dvc.org/blog/dvc-1-0-release.<p>Happy to answer any questions here or at DVC Discord chat https://dvc.org/chat.
Is it possible to use DVC within the new implementation of GitHub Actions? I checked it out on the website and apparently it looks like it supports it, but I wanted to know more about how you guys are getting ready for this new CI / CD feature?