TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

MinGPT: Minimal PyTorch re-implementation of GPT

223 pointsby memorableover 2 years ago

9 comments

karpathyover 2 years ago
Hah funny to see this on HN, it is a relatively old project but one that I continue to love and still work on. I was trying to train a GPT one day and discovered that available implementations were quite complex, spread across many files, and took way too many kwargs switches for esoteric&#x2F;rare options that just bloated and complexified the code. But in my head a GPT was a super simple neat, isotropic model, so I got all worked up and wrote minGPT.<p>The project went on to have more impact than I originally imagined and made its way into a number of projects and papers. One of those I found only a few days ago here: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;karpathy&#x2F;status&#x2F;1566100736076697600" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;karpathy&#x2F;status&#x2F;1566100736076697600</a> . What I love about these projects is that the authors often &quot;hack up&quot; minGPT in code directly. They don&#x27;t configure a comprehensive kwarg monster. I think there&#x27;s a beauty in that. Very often I wish we had more gists and fewer frameworks - to look at code chunks, understand them completely, tune them to our needs, and re-use them in projects, similar to how bacteria trade little DNA plasmids. minGPT is written for those who want that for their GPT projects. There&#x27;s plenty of cons to this approach too, ultimately I think there&#x27;s value in both approaches.<p>Coming up the theme of future minGPT development: more examples, and more teeth - it should be possible to demonstrate the training of relatively serious (~few B) models with minGPT on one n-gpu node and reproduce some benchmarks around that scale, but never sacrifice its readability.
评论 #32740046 未加载
评论 #32740507 未加载
评论 #32741279 未加载
评论 #32739612 未加载
评论 #32742867 未加载
评论 #32744672 未加载
polygamous_batover 2 years ago
This is actually a pretty neat, self-contained implementation that can super easily extended beyond stereotypical natural language models, for example to create world models for video games [1] or to create robot models that can learn to imitate from large, chaotic human demonstration data [2] (disclaimer, I&#x27;m an author on the second one.) Basically, GPT (or minGPT) models are EXCELLENT sequence modelers, almost to the point where you can throw any sensible sequence data at it and hope to get interesting results, as long as you don&#x27;t overfit.<p>Even though I have only been working on machine learning for around six years, it&#x27;s crazy to see how the landscape has changed so fast so recently, including diffusion models and transformers. It&#x27;s not too much to say that we might expect more major breakthroughs by the end of this decade, and end in a place we can&#x27;t even imagine right now!<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;eloialonso&#x2F;iris" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;eloialonso&#x2F;iris</a> [2] <a href="https:&#x2F;&#x2F;github.com&#x2F;notmahi&#x2F;bet" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;notmahi&#x2F;bet</a>
评论 #32737716 未加载
s_Hoggover 2 years ago
Karpathy really seems to have discovered there are a lot of hours in the day now he doesn&#x27;t work for Tesla
评论 #32737713 未加载
评论 #32740938 未加载
评论 #32737159 未加载
deracover 2 years ago
I love your approach and philosophy around programming. If anyone is unaware, Karpathy has a relatively small youtube channel he started a few weeks ago. <a href="https:&#x2F;&#x2F;youtu.be&#x2F;VMj-3S1tku0" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;VMj-3S1tku0</a>
dangover 2 years ago
Related:<p><i>Karpathy&#x27;s MinGPT</i> - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=24189497" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=24189497</a> - Aug 2020 (102 comments)
mark_l_watsonover 2 years ago
Nice! I remember way back when studying Karpathy’s character RNN code, a great study resource. Looking forwards to understanding this example also!
评论 #32742307 未加载
HeckFeckover 2 years ago
Here was I thinking someone had recreated the GUID Partition Table in some form of micropython. Perhaps someday.
frozencellover 2 years ago
Is there a Colab available yet?
rexreedover 2 years ago
With enough training data and enough GPUs to do the model training, you&#x27;ll be there! Goes to show that for AI, the code really isn&#x27;t the important part. AI is and always has been about data and compute.