For anyone looking to fine-train transformers with less work, there is the FARM project (<a href="https://github.com/deepset-ai/FARM" rel="nofollow">https://github.com/deepset-ai/FARM</a>) which has some more or less ready-to-go configurations (classification, question answering, NER, and a couple of others). It's really almost "plug in a csv and run".<p>By the way, a pet peeve is sentiment detection. It's a useful method, but please be aware that it does not measure "sentiment" in a way that one would normally think, and that what it measures varies strongly across methods (<a href="https://www.tandfonline.com/doi/abs/10.1080/19312458.2020.1869198" rel="nofollow">https://www.tandfonline.com/doi/abs/10.1080/19312458.2020.18...</a>).
Hm. I read this expecting a more in-depth discussion about best practices for fine-tuning massive transformers while avoiding catastrophic forgetting, ie.<p>* How should you select the learning rate?<p>* What tasks are best for fine-tuning on small amounts of data?
etc.<p>Instead, this seems mostly to just be running through the implementation of ML/DL 101: loss function for binary classification, helper functions to load data, etc.
The same transformer diagram from the original paper, replicated everywhere. Nobody got time for redrawing.<p>BTW, take a look at "sentence transformers" library, a nice interface on top of Hugging Face for this kind of operations (reusing, fine-tuning).<p><a href="https://www.sbert.net/" rel="nofollow">https://www.sbert.net/</a>