Mesh-TensorFlow (<a href="https://github.com/tensorflow/mesh" rel="nofollow">https://github.com/tensorflow/mesh</a>) solves the problem of networks being too large to fit into a single GPU's memory. I just trained a network with > 1 billion parameters across 4 GPUs. Beside size of data set and compute power, representational capacity (size of embeddings) is a key blocker to learning, and so this library opens up new possibilities for many of us.