TechEcho

I wanted to share Tricycle (<a href="https://github.com/bclarkson-code/Tricycle">https://github.com/bclarkson-code/Tricycle</a>), a deep learning framework I built completely from scratch from Autograd to a GPT.I wanted a library that is fast and feature rich enough to train actual models while being simple enough that anyone with a bit of python experience can understand what is going on.The biggest milestone so far is training GPT-2 (124M) on 2.3B tokens in just under 3 days on my GPU (RTX 3090).So far, I've added the following to Tricycle:- An automatic differentiation engine- General matrix operations with einsum- Standard network layers (Dense, ReLU, GeLU etc)- Transformer blocks (MultiHeadSelfAttention and MLP blocks)- Optimisers (SGD, AdamW)- GPT-2- etcThe project is still under active development, I'm in the process of adding mixed precision and multi-gpu support with the goal of scaling up to larger models.To see it in action, the best place to start is train_smol_gpt.py which will train GPT-2 from scratch.Let me know what you think!

Show HN: Deep learning framework from scratch, trains GPT-2 in 3 days

no comments

Show HN: Deep learning framework from scratch, trains GPT-2 in 3 days

no comments