This is another good result of applying transfer learning to NLP.<p>Transfer learning works great for vision problems (just reuse one of the big SoTA trained on ImageNet networks - I like resnet50). This was enabled by the amazingly shared structure of vision problems. There was nothing similar for NLP, besides pre-trained first layers like word2vec. If you want to learn more, check out the fast.ai DL course, it features transfer learning a lot.<p>But this model and ULMFiT (nlp.fast.ai) show that deeper nets can be pretrained for NLP, and achieve good results when transfered to other datasets and problems.<p>This enables not just the obvious use case of "I don't have N GPUs to train a deep net from scratch but I can now finetune a pre-trained model" but more subtle and interesting cases like fine-tuning on a very small dataset (compared to ImageNet or 100000 samples NLP data sets) and cheap training on demand. Training a new model for every user was way too expensive if training from scratch, but if fine-tuning a pre-trained net takes just a few minutes, why not ?
*a very specific implementation of NLP.<p>Not that this library isn't promising, but the name and presentation makes it seem far more general than it really is.
In that spirit, and most likely much more general, for PyTorch:<p><a href="https://pytoune.org/" rel="nofollow">https://pytoune.org/</a> (Keras-like interface for PyTorch) and <a href="https://github.com/dnouri/skorch" rel="nofollow">https://github.com/dnouri/skorch</a> (Scikit-learn interface for PyTorch).<p>As a side note, a project of mine: super-simple Jupyter Notebook training plots for Keras and PyToune: <a href="https://github.com/stared/livelossplot" rel="nofollow">https://github.com/stared/livelossplot</a> (with bare API, so you can connect it to anything you wish)