TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A Visual Guide to LLM Quantization

310 点作者 raymond_goo10 个月前

8 条评论

danieldk10 个月前
This is really an awesome introduction into quantization! One small comment about the GPTQ section:<p><i>It uses asymmetric quantization and does so layer by layer such that each layer is processed independently before continuing to the next</i><p>GPTQ also supports symmetric quantization and almost everyone uses it. The problem with GPTQ asymmetric quantization is that all popular implementations have a bug [1] where all zero&#x2F;bias values of 0 are reset to 1 during packing (out of 16 possible biases in 4-bit quantization), leading to quite a large loss in quality. Interestingly, it seems that people initially observed that symmetric quantization worked better than asymmetric quantization (which is very counter-intuitive, but made GPTQ symmetric quantization far more popular) and only discovered later that it is due to a bug.<p>[1] <a href="https:&#x2F;&#x2F;notes.danieldk.eu&#x2F;ML&#x2F;Formats&#x2F;GPTQ#Packing+integers" rel="nofollow">https:&#x2F;&#x2F;notes.danieldk.eu&#x2F;ML&#x2F;Formats&#x2F;GPTQ#Packing+integers</a>
评论 #41110822 未加载
jillesvangurp10 个月前
Fairly helpful overview. One thing that probably has a good answer is why to use floats at all; even at 32 bits? Is there an advantage relative to using just 32 bit ints? It seems integer math is a lot easier to do in hardware. Back when I was young, you had to pay extra to get floating point hardware support in your PC. It required a co-processor. I&#x27;m assuming that is still somewhat true in terms of numbers of transistors needed on chips.<p>Intuitively, I like the idea of asymmetric scales as well. Treating all values as equal seems like it&#x27;s probably wasteful in terms of memory. It would be interesting to see where typical values fall statistically in an LLM. I bet it&#x27;s nowhere near a random distribution of values.
评论 #41107877 未加载
评论 #41108379 未加载
hazrmard10 个月前
I&#x27;ve read the huggingface blog on quantization, and a plethora of papers such as `bitsandbytes`. This was an approachable agglomeration of a lot of activity in this space with just the right references at the end. Bookmarked!
woodson10 个月前
It’s a shame that the article didn’t mention AWQ 4-bit quantization, which is quite widely supported in libraries and deployment tools (e.g. vLLM).
torginus10 个月前
I&#x27;ve long held the assumption that neurons in networks are just logic functions, where you can just write out their truth tables by taking all the combinations of their input activations and design an logic network that matches that 100% - thus 1-bit &#x27;quantization&#x27; should be enough to perfectly recreate any neural network for inference.
评论 #41108205 未加载
llm_trw10 个月前
This is a very misleading article.<p>Floats are not distributed evenly across the number line. The number of floats between 0 and 1 is the same as the number of floats between 1 and 3, then between 3 and 7 and so on. Quantising well to integers means that you take this sensitivity into account since the spacing between integers is always the same.
评论 #41112688 未加载
dleeftink10 个月前
What an awesome collection of visual mappings between process and output, immediately gripping, visually striking and thoughtfully laid out. I&#x27;d love to hear more about the process behind them, a hallmark in exploratory visualisation.
cheptsov10 个月前
I wonder why AWQ is not mentioned. It’s pretty popular and I always was curious how it is different from GPTQ.