Huggingface has been working on implementing this into their library, and it has some pretty amazing effects on the size of models you can train on a simple Colab.<p><a href="https://huggingface.co/blog/zero-deepspeed-fairscale" rel="nofollow">https://huggingface.co/blog/zero-deepspeed-fairscale</a>