科技回声

Hi HN! Since my time as a PhD student more than ten years ago, I have dreamed of tracking experiments as much as I want, with reproducibility and collaboration as core principles and the ability to resume the computation state anywhere to add more metrics or results.Today, I’m unveiling MLtraq (https://mltraq.com), an open-source Python library for AI developers to design, execute and share experiments. It allows you to track anything, reproduce, collaborate, and resume the computation state anywhere.KEY BENEFITS- Extreme Tracking and Interoperability: With native database types, native serialization in Numpy and PyArrow, and a safe subset of opcodes for Python pickles, unprecedented (and safe) tracking capabilities.- Promoting Collaboration: Work seamlessly with your team by creating, storing, reloading, mixing, resuming, and sharing experiments using any local or cloud SQL database.- Flexible: Interact with your experiments using Python, Pandas, and SQL from Python scripts, Jupyter notebooks, and dashboards without vendor lock-in.DOCUMENTATION AND CODE- Documentation: https://www.mltraq.com- Source code: https://github.com/elehcimd/mltraqThoughts? Looking forward to your feedback. I hope you enjoy it! Thank You! (You can also reach out to me directly; my email is on my profile page.)Cheers, MichelePS. Sharing a few questions I have addressed so far:1) “How does it differ from MLflow”?There’s an overlap in features, but the scope is different:With MLtraq, tracking the state is so transparent that it feels like checkpointing the experiment for later analysis and continuation. With robust/flexible serialization, experiments can easily copy/load to new databases. In MLflow, tracking is designed to track metrics and only a little more. The setup is less flexible, but you have more readily available integrations.With MLflow, the emphasis is on covering the complete lifecycle, including model versioning and artifacts storage. MLtraq emphasizes experimentation, with an excellent model for experiments inspired by state monads from functional programming that encourages incapsulation/composition and parameter grids to simplify exploration.In summary, MLflow is a better fit if you prioritize MLOps. MLtraq is a good candidate for experimentation. --2) “Does it work for Torch models, too?”Let’s start with the example “IRIS Flowers Classification” at https://mltraq.com/#example-3-iris-flowers-classification. Using https://skorch.readthedocs.io/en/stable/classifier.html, we can add one more scikit-learn compatible model. Alternatively, one can redesign the train_predict step without using scikit-learn. The results will include the accuracy score for the newly added model.

Hi! OP here, addressing one more question I received somewhere else:3) “Can it also track the model's state during training if, e.g., there is an early stop, and then I want to continue the training process?”With MLtraq, You can dump and load arbitrary objects, including model weights and other state parameters. Let's consider the example <a href="https://mltraq.com/howto/02-artifacts-storage/" rel="nofollow">https://mltraq.com/howto/02-artifacts-storage/</a>. MLtraq dumps and reloads from the filesystem the binary blobs referenced in the tracked metadata. Similarly, you can store artifacts in third-party services and data stores.

Show HN: MLtraq – Track and Collaborate on AI Experiments (Open-Source)

1 comment

Show HN: MLtraq – Track and Collaborate on AI Experiments (Open-Source)

1 comment