科技回声

8 条评论

minimaxir超过 4 年前

As someone who maintains a package to both make it easy to fine-tune GPT-2 or create your own from scratch (<a href="https://github.com/minimaxir/aitextgen" rel="nofollow">https://github.com/minimaxir/aitextgen</a>), this submission is a good run-through of the technical considerations toward building a GPT-2 model.It's both substantially easier and faster than it was when OpenAI released their paper in 2019, thanks to both Huggingface Transformers and Tokenizers making the architectures more efficient and other companies streamlining the training process and make it more efficient for all parts in the pipeline.You don't need a TPU cluster to train a working GPT-2 model, although it helps (unfortunately TPU support on PyTorch-based training like aitextgen is more fussy). A free GPU on Colab gets you most of the way, especially since you can get now a T4 or a V100 which lets you use FP16.

评论 #25884767 未加载

评论 #25888362 未加载

评论 #25885583 未加载

评论 #25885100 未加载

zirkonit超过 4 年前

First off -- the author has done an amazing tutorial, it's very enjoyable, so I am by no means throwing a shade.But a week of TPUv3-128 is anywhere between $10k and $20k in TPU costs alone; saying that this is an "at home" kind of experiment is cheeky at best, clickbait at worst.

评论 #25884724 未加载

评论 #25885025 未加载

评论 #25884467 未加载

polytronic超过 4 年前

The author at 17 years of age can understand academics and research. Has the skills and dedication to go through an exercise of reconstructing state-of-the-art.I can't help but feel pride and hope for the future, both the author's and the world.

评论 #25886171 未加载

alexpeattie超过 4 年前

The article has moved here: <a href="https://bilal2vec.github.io/blog/algpt2/2020/07/17/ALGPT2-part-2" rel="nofollow">https://bilal2vec.github.io/blog/algpt2/2020/07/17/ALGPT2-pa...</a>

kyberias超过 4 年前

How many off-the-shelf GPUs are needed to replicate GPT-2 in a year?

评论 #25884880 未加载

deeviant超过 4 年前

At home, in the cloud, for tens of thousands of $$$.

评论 #25886286 未加载

soohamr超过 4 年前

UWaterloo has such precocious students

amelius超过 4 年前

TL;DR:> Unfortunately, ALGPT-2 doesn’t perform as well as GPT-2 (ALGPT-2 gets 313131 ppl on OpenWebText compared to 212121 ppl for my pretrained GPT-2 model), but I’m writing this series of blog posts to go through everything I’ve learned over the last few months.

评论 #25890384 未加载

8 条评论

minimaxir超过 4 年前

评论 #25884767 未加载

评论 #25888362 未加载

评论 #25885583 未加载

评论 #25885100 未加载

zirkonit超过 4 年前

评论 #25884724 未加载

评论 #25885025 未加载

评论 #25884467 未加载

polytronic超过 4 年前

评论 #25886171 未加载

alexpeattie超过 4 年前

The article has moved here: <a href="https://bilal2vec.github.io/blog/algpt2/2020/07/17/ALGPT2-part-2" rel="nofollow">https://bilal2vec.github.io/blog/algpt2/2020/07/17/ALGPT2-pa...</a>

kyberias超过 4 年前

How many off-the-shelf GPUs are needed to replicate GPT-2 in a year?

评论 #25884880 未加载

deeviant超过 4 年前

At home, in the cloud, for tens of thousands of $$$.

评论 #25886286 未加载

soohamr超过 4 年前

UWaterloo has such precocious students

amelius超过 4 年前

评论 #25890384 未加载

Replicating GPT-2 at Home

8 条评论

Replicating GPT-2 at Home

8 条评论