TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

SmolGPT: A minimal PyTorch implementation for training a small LLM from scratch

434 pointsby amrrs4 months ago

18 comments

attentionmech4 months ago
This is cool, and timely (I wanted a neat repo like that).<p>I have also been working from last 2 weeks on a gpt implementation in C. Eventually it turned out to be really slow (without CUDA). But it taught me how much memory management and data management there is when implementing these systems. You are running like a loop billions of times so you need to preallocate the computational graph and stuff. If anyone wanna check out it&#x27;s ~1500 LOC single file:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;attentionmech&#x2F;gpt.c&#x2F;blob&#x2F;main&#x2F;gpt.c">https:&#x2F;&#x2F;github.com&#x2F;attentionmech&#x2F;gpt.c&#x2F;blob&#x2F;main&#x2F;gpt.c</a>
sitkack4 months ago
Neat, I love projects like these.<p>The next level down is to do it directly in numpy.<p>And then from there, write a minimal numpy work-a-like to support the model above.<p>You start with a working system using the most powerful abstractions. Then you iteratively remove abstractions, lowering your solution, then when you get low enough but still riding on an external abstraction, you rewrite that, but ONLY to support the layers above you.<p>Following the above pattern, you can bootstrap yourself to have full system understanding. This is not unlike RL+distillation that human persons do learn complex topics.
评论 #42872400 未加载
评论 #42870859 未加载
评论 #42871283 未加载
c0wb0yc0d3r4 months ago
Can someone help me understand what I’m looking at here? This repository allows me to train a specific model on a specific data set, and finally test the result? Is that correct?<p>I am interested in how large and small language models are trained, but as someone who has little knowledge in this world I find it hard to cut through the noise to find useful information.<p>Really I’m looking for an open source project that helps a person gain this knowledge. Something like a docker container that encapsulates all the dependencies. When training it will use any available gpu or tell me why my gpu can’t be used and then fall back to cpu. Then had a simple interface to test the training results. Finally you can easily pull back the curtain to understand the process in better detail and maybe even adapt it to different model to experiment.<p>Does something like that exist?
评论 #42874947 未加载
评论 #42880102 未加载
评论 #42873632 未加载
评论 #42874310 未加载
numba8884 months ago
github has a bunch of them for years, the most known from Andrej Karpathy:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;karpathy&#x2F;nanoGPT">https:&#x2F;&#x2F;github.com&#x2F;karpathy&#x2F;nanoGPT</a><p>some other have MoE implemented.
评论 #42872030 未加载
评论 #42870200 未加载
febin4 months ago
Here&#x27;s a google collab notebook built from this. It takes ~2 hours on A100 GPU if you have collab pro. Might work on free account as well.<p><a href="https:&#x2F;&#x2F;colab.research.google.com&#x2F;drive&#x2F;1dklqzK8TDPfbPbyHrk3llXFOOiOhFUeJ?usp=sharing#scrollTo=BEgEJhqeLAgg" rel="nofollow">https:&#x2F;&#x2F;colab.research.google.com&#x2F;drive&#x2F;1dklqzK8TDPfbPbyHrk3...</a>
Lerc4 months ago
The example story is interesting.<p>I have made my own implementation from scratch with my own multi-channel tokeniser, each channel gets its own embedding table 32768, 256,256, 64, and 4. Which are summed along with the position encoding.<p>Yet with all of those differences, my stories have Lily as a protagonist often enough that I thought I had a bug somewhere.<p>Might have to check tinystories for name distribution.<p>Most questionable output from mine so far:<p>&quot;one day, a naughty man and a little boy went to the park place to find some new things.&quot;
brap4 months ago
It’s interesting that technology so transformative is only a few hundred lines of code (excluding underlying frameworks and such).<p>How big would you guess state of the art models are, in terms of lines of code?
评论 #42871800 未加载
评论 #42873817 未加载
ks20484 months ago
So, this has nothing to do with &quot;SmolLM&quot; - a set of models (with data, training recipes, etc) released by HuggingFace? <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;blog&#x2F;smollm" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;blog&#x2F;smollm</a>
OmAlve4 months ago
Thanks a lot for posting this here! I can&#x27;t believe it went viral, makes all the efforts feel worth it now! - Om Alve
评论 #42882968 未加载
quantadev4 months ago
I noticed several people mentioned Karpathy already, but I wanted to include that his tiny &quot;Micrograd&quot; project (see Youtube Video and GitHub) is a great introduction to Neural Nets (Multilayer Peceptron), which is at the core of [most] machine learning of course.
mkagenius4 months ago
Looks like a rip off of - <a href="https:&#x2F;&#x2F;github.com&#x2F;PraveenRaja42&#x2F;Tiny-Stories-GPT">https:&#x2F;&#x2F;github.com&#x2F;PraveenRaja42&#x2F;Tiny-Stories-GPT</a><p>without any credits to above or TinyStories paper.
评论 #42883420 未加载
评论 #42875745 未加载
ideashower4 months ago
Can anyone share what a training dataset would look like for something like this? What are some use cases?
评论 #42883049 未加载
评论 #42880089 未加载
the_real_cher4 months ago
Any body have any good readings they read and liked to kind of understand what is going on with how this works?
评论 #42871007 未加载
评论 #42877874 未加载
imdsm4 months ago
Is there a corresponding article for this? I&#x27;d love to read through it!
antirez4 months ago
No cpu &#x2F; mps support to train on Macs, apparently.
评论 #42871405 未加载
Diffused_asi4 months ago
How many parameters model is this ?
评论 #42916615 未加载
nostradumbasp4 months ago
Cute! Keep making fun things.
评论 #42916616 未加载
spidermonkey234 months ago
Is there anything that can run locally on mobile in temrux