TechEcho

Google employee here, xmanger is one of the main ML experiment tracking/orchestration tool we use internally, I'm pretty excited that it is now available for other to use!In a nutshell, xmanager allows you to:- define an experiment, which is a collection of one or more work units (think combination of hyperparamters)- manage the different jobs/executable required to run this experiment (TPU workers, tensorboard job, etc.)- collect and display measurements from work units (loss, other metrics)- keep a reproducible artifact which allows you to re-run the same experiment at any point in the futureSee e.g. <a href="https://github.com/deepmind/xmanager/blob/main/examples/" rel="nofollow">https://github.com/deepmind/xmanager/blob/main/examples/</a> for a few concrete examples of a launcher scripts.I wish they had included screenshots of the tool itself in the repo, I'll make that suggestion :).

It's great this is open sourced. This technology was key to enabling ML folks to scale up computation without having to deal with borg and a bunch of other low-level systems.It's one of the few systems in ML that I've used and thought "huh, this was well-designed and properly architected from the start"

XManager: A framework for managing machine learning experiments

2 comments

XManager: A framework for managing machine learning experiments

2 comments