19 pointsby kuprelabout 1 year ago

3 comments

kuprelabout 1 year ago

This adds PyTorch/CUDA training support to Andrej Karpathy's minbpe. It takes 2min 28sec (148 seconds) on an RTX4090 to train the BasicTokenizer with a vocab_size of 512 on 307MB of Enron emails. The original code takes 2hrs 15min (8076 seconds) on an M2 Air with Python 3.11 to do this. That is a 55x speedup.

评论 #39481658 未加载

Havocabout 1 year ago

> 307MB of Enron emails<p>Wait what?<p>Is that some sort of inside joke?

评论 #39475783 未加载

erichoceanabout 1 year ago

Now someone needs to do a Mojo version, and write up the blog post.

55x Speedup of Andrej Karpathy's Minbpe LLM Tokenizer with PyTorch/CUDA

3 comments

55x Speedup of Andrej Karpathy's Minbpe LLM Tokenizer with PyTorch/CUDA

3 comments