Author seems to be using billion == 10^12 instead of the common billion == 10^9. A lot of the math still works out since there's a multiply and a divide by a billion, but it is a little confusing to see passages like this:<p>> Given the parameter count, we can multiply by two to get bytes. So to calculate the size of the weights for a 52B model.<p>> 52e12⋅2 = 104e12 bytes ≈ 104GB