TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Deep learning framework from scratch, trains GPT-2 in 3 days

2 pointsby netwrt10 months ago
I wanted to share Tricycle (<a href="https:&#x2F;&#x2F;github.com&#x2F;bclarkson-code&#x2F;Tricycle">https:&#x2F;&#x2F;github.com&#x2F;bclarkson-code&#x2F;Tricycle</a>), a deep learning framework I built completely from scratch from Autograd to a GPT.<p>I wanted a library that is fast and feature rich enough to train actual models while being simple enough that anyone with a bit of python experience can understand what is going on.<p>The biggest milestone so far is training GPT-2 (124M) on 2.3B tokens in just under 3 days on my GPU (RTX 3090).<p>So far, I&#x27;ve added the following to Tricycle:<p>- An automatic differentiation engine<p>- General matrix operations with einsum<p>- Standard network layers (Dense, ReLU, GeLU etc)<p>- Transformer blocks (MultiHeadSelfAttention and MLP blocks)<p>- Optimisers (SGD, AdamW)<p>- GPT-2<p>- etc<p>The project is still under active development, I&#x27;m in the process of adding mixed precision and multi-gpu support with the goal of scaling up to larger models.<p>To see it in action, the best place to start is train_smol_gpt.py which will train GPT-2 from scratch.<p>Let me know what you think!

no comments

no comments