This is really an awesome introduction into quantization! One small comment about the GPTQ section:<p><i>It uses asymmetric quantization and does so layer by layer such that each layer is processed independently before continuing to the next</i><p>GPTQ also supports symmetric quantization and almost everyone uses it. The problem with GPTQ asymmetric quantization is that all popular implementations have a bug [1] where all zero/bias values of 0 are reset to 1 during packing (out of 16 possible biases in 4-bit quantization), leading to quite a large loss in quality. Interestingly, it seems that people initially observed that symmetric quantization worked better than asymmetric quantization (which is very counter-intuitive, but made GPTQ symmetric quantization far more popular) and only discovered later that it is due to a bug.<p>[1] <a href="https://notes.danieldk.eu/ML/Formats/GPTQ#Packing+integers" rel="nofollow">https://notes.danieldk.eu/ML/Formats/GPTQ#Packing+integers</a>