TechEcho

18 comments

This is cool, and timely (I wanted a neat repo like that).I have also been working from last 2 weeks on a gpt implementation in C. Eventually it turned out to be really slow (without CUDA). But it taught me how much memory management and data management there is when implementing these systems. You are running like a loop billions of times so you need to preallocate the computational graph and stuff. If anyone wanna check out it's ~1500 LOC single file:<a href="https://github.com/attentionmech/gpt.c/blob/main/gpt.c">https://github.com/attentionmech/gpt.c/blob/main/gpt.c</a>

sitkack4 months ago

Neat, I love projects like these.The next level down is to do it directly in numpy.And then from there, write a minimal numpy work-a-like to support the model above.You start with a working system using the most powerful abstractions. Then you iteratively remove abstractions, lowering your solution, then when you get low enough but still riding on an external abstraction, you rewrite that, but ONLY to support the layers above you.Following the above pattern, you can bootstrap yourself to have full system understanding. This is not unlike RL+distillation that human persons do learn complex topics.

评论 #42872400 未加载

评论 #42870859 未加载

评论 #42871283 未加载

c0wb0yc0d3r4 months ago

Can someone help me understand what I’m looking at here? This repository allows me to train a specific model on a specific data set, and finally test the result? Is that correct?I am interested in how large and small language models are trained, but as someone who has little knowledge in this world I find it hard to cut through the noise to find useful information.Really I’m looking for an open source project that helps a person gain this knowledge. Something like a docker container that encapsulates all the dependencies. When training it will use any available gpu or tell me why my gpu can’t be used and then fall back to cpu. Then had a simple interface to test the training results. Finally you can easily pull back the curtain to understand the process in better detail and maybe even adapt it to different model to experiment.Does something like that exist?

评论 #42874947 未加载

评论 #42880102 未加载

评论 #42873632 未加载

评论 #42874310 未加载

numba8884 months ago

github has a bunch of them for years, the most known from Andrej Karpathy:<a href="https://github.com/karpathy/nanoGPT">https://github.com/karpathy/nanoGPT</a>some other have MoE implemented.

评论 #42872030 未加载

评论 #42870200 未加载

febin4 months ago

Here's a google collab notebook built from this. It takes ~2 hours on A100 GPU if you have collab pro. Might work on free account as well.<a href="https://colab.research.google.com/drive/1dklqzK8TDPfbPbyHrk3llXFOOiOhFUeJ?usp=sharing#scrollTo=BEgEJhqeLAgg" rel="nofollow">https://colab.research.google.com/drive/1dklqzK8TDPfbPbyHrk3...</a>

Lerc4 months ago

The example story is interesting.I have made my own implementation from scratch with my own multi-channel tokeniser, each channel gets its own embedding table 32768, 256,256, 64, and 4. Which are summed along with the position encoding.Yet with all of those differences, my stories have Lily as a protagonist often enough that I thought I had a bug somewhere.Might have to check tinystories for name distribution.Most questionable output from mine so far:"one day, a naughty man and a little boy went to the park place to find some new things."

brap4 months ago

It’s interesting that technology so transformative is only a few hundred lines of code (excluding underlying frameworks and such).How big would you guess state of the art models are, in terms of lines of code?

评论 #42871800 未加载

评论 #42873817 未加载

ks20484 months ago

So, this has nothing to do with "SmolLM" - a set of models (with data, training recipes, etc) released by HuggingFace? <a href="https://huggingface.co/blog/smollm" rel="nofollow">https://huggingface.co/blog/smollm</a>

OmAlve4 months ago

Thanks a lot for posting this here! I can't believe it went viral, makes all the efforts feel worth it now! - Om Alve

评论 #42882968 未加载

quantadev4 months ago

I noticed several people mentioned Karpathy already, but I wanted to include that his tiny "Micrograd" project (see Youtube Video and GitHub) is a great introduction to Neural Nets (Multilayer Peceptron), which is at the core of [most] machine learning of course.

mkagenius4 months ago

Looks like a rip off of - <a href="https://github.com/PraveenRaja42/Tiny-Stories-GPT">https://github.com/PraveenRaja42/Tiny-Stories-GPT</a>without any credits to above or TinyStories paper.

评论 #42883420 未加载

评论 #42875745 未加载

ideashower4 months ago

Can anyone share what a training dataset would look like for something like this? What are some use cases?

评论 #42883049 未加载

评论 #42880089 未加载

the_real_cher4 months ago

Any body have any good readings they read and liked to kind of understand what is going on with how this works?

评论 #42871007 未加载

评论 #42877874 未加载

imdsm4 months ago

Is there a corresponding article for this? I'd love to read through it!

antirez4 months ago

No cpu / mps support to train on Macs, apparently.

评论 #42871405 未加载

Diffused_asi4 months ago

How many parameters model is this ?

评论 #42916615 未加载

nostradumbasp4 months ago

Cute! Keep making fun things.

评论 #42916616 未加载

spidermonkey234 months ago

Is there anything that can run locally on mobile in temrux

18 comments

attentionmech4 months ago

sitkack4 months ago

评论 #42872400 未加载

评论 #42870859 未加载

评论 #42871283 未加载

c0wb0yc0d3r4 months ago

评论 #42874947 未加载

评论 #42880102 未加载

评论 #42873632 未加载

评论 #42874310 未加载

numba8884 months ago

github has a bunch of them for years, the most known from Andrej Karpathy:<a href="https://github.com/karpathy/nanoGPT">https://github.com/karpathy/nanoGPT</a>some other have MoE implemented.

评论 #42872030 未加载

评论 #42870200 未加载

febin4 months ago

Lerc4 months ago

brap4 months ago

评论 #42871800 未加载

评论 #42873817 未加载

ks20484 months ago

OmAlve4 months ago

Thanks a lot for posting this here! I can't believe it went viral, makes all the efforts feel worth it now! - Om Alve

评论 #42882968 未加载

quantadev4 months ago

mkagenius4 months ago

Looks like a rip off of - <a href="https://github.com/PraveenRaja42/Tiny-Stories-GPT">https://github.com/PraveenRaja42/Tiny-Stories-GPT</a>without any credits to above or TinyStories paper.

评论 #42883420 未加载

评论 #42875745 未加载

ideashower4 months ago

Can anyone share what a training dataset would look like for something like this? What are some use cases?

评论 #42883049 未加载

评论 #42880089 未加载

the_real_cher4 months ago

Any body have any good readings they read and liked to kind of understand what is going on with how this works?

评论 #42871007 未加载

评论 #42877874 未加载

imdsm4 months ago

Is there a corresponding article for this? I'd love to read through it!

antirez4 months ago

No cpu / mps support to train on Macs, apparently.

评论 #42871405 未加载

Diffused_asi4 months ago

How many parameters model is this ?

评论 #42916615 未加载

nostradumbasp4 months ago

Cute! Keep making fun things.

评论 #42916616 未加载

spidermonkey234 months ago

Is there anything that can run locally on mobile in temrux

SmolGPT: A minimal PyTorch implementation for training a small LLM from scratch

18 comments

SmolGPT: A minimal PyTorch implementation for training a small LLM from scratch

18 comments