I wanted to share Tricycle (<a href="https://github.com/bclarkson-code/Tricycle">https://github.com/bclarkson-code/Tricycle</a>), a deep learning framework I built completely from scratch from Autograd to a GPT.<p>I wanted a library that is fast and feature rich enough to train actual models while being simple enough that anyone with a bit of python experience can understand what is going on.<p>The biggest milestone so far is training GPT-2 (124M) on 2.3B tokens in just under 3 days on my GPU (RTX 3090).<p>So far, I've added the following to Tricycle:<p>- An automatic differentiation engine<p>- General matrix operations with einsum<p>- Standard network layers (Dense, ReLU, GeLU etc)<p>- Transformer blocks (MultiHeadSelfAttention and MLP blocks)<p>- Optimisers (SGD, AdamW)<p>- GPT-2<p>- etc<p>The project is still under active development, I'm in the process of adding mixed precision and multi-gpu support with the goal of scaling up to larger models.<p>To see it in action, the best place to start is train_smol_gpt.py which will train GPT-2 from scratch.<p>Let me know what you think!