Hey HN, Creator of DAGsHub here!<p>I wanted to share something cool that we've been working on. You might've heard about DVC (dvc.org) and MLflow (mlflow.org). They are open-source projects for ML that are widely adopted, each for its own specialty.<p>DVC is great for versioning your data.
MLflow is used for a bunch of stuff (it's actually multiple tools combined into one) – but mostly for experiment tracking. Since they are open source, setting up storage for DVC and a central tracking server for MLflow can be a pain – requiring you to create cloud accounts, add permissions, and more.<p>DAGsHub is already integrated with DVC, in the sense that whenever you create a project, it comes with a free, built-in, DVC remote. Now, you also get a free MLflow server, which means you can log experiments directly to DAGsHub and share it with your team or colleagues.<p>Why I think this is awesome:
- Zero setup – add your MLflow remote server URI and just log experiments
- Access control built-in – if you have a team and some people need access only to view the experiments but not to log new ones, you can easily control that
- Better UI for comparison – one of the complaints MLflow users have had was about the inability to compare runs across experiments in MLflow, with DAGsHub that is easily possible as all runs appear in a single list, which you can then filter to fit only a single experiment.
- Integrating MLflow and DVC – a lot of people are working with both systems and building ad-hoc systems to integrate them, but with this integration, you can create a project that was built for this type of work, integrated with all the tools you need.<p>Here's a more detailed blog post from the engineer that built this:
https://dagshub.com/blog/launching-dagshub-integration-with-databricks-mlflow/<p>Or in video form:
https://www.youtube.com/watch?v=yKxOG6qdjvg&t=16s<p>I'd love to hear your thoughts about it.