As someone who maintains a package to both make it easy to fine-tune GPT-2 or create your own from scratch (<a href="https://github.com/minimaxir/aitextgen" rel="nofollow">https://github.com/minimaxir/aitextgen</a>), this submission is a good run-through of the technical considerations toward building a GPT-2 model.<p>It's both substantially easier and faster than it was when OpenAI released their paper in 2019, thanks to both Huggingface Transformers and Tokenizers making the architectures more efficient and other companies streamlining the training process and make it more efficient for all parts in the pipeline.<p>You don't need a TPU cluster to train a working GPT-2 model, although it helps (unfortunately TPU support on PyTorch-based training like aitextgen is more fussy). A free GPU on Colab gets you most of the way, especially since you can get now a T4 or a V100 which lets you use FP16.