47 点作者 lawrencechen大约 2 年前

2 条评论

thatcherc大约 2 年前

Author seems to be using billion == 10^12 instead of the common billion == 10^9. A lot of the math still works out since there's a multiply and a divide by a billion, but it is a little confusing to see passages like this:<p>> Given the parameter count, we can multiply by two to get bytes. So to calculate the size of the weights for a 52B model.<p>> 52e12⋅2 = 104e12 bytes ≈ 104GB

评论 #35950956 未加载

评论 #35951507 未加载

gxh8N大约 2 年前

Very nicely written. I also like how it changes color every time I reload the article.

Transformer Inference Arithmetic (2022)

2 条评论

Transformer Inference Arithmetic (2022)

2 条评论