>Aggressively pruning LLMs via quantization can significantly reduce their accuracy and you might be better off using a smaller model in the first place.<p>Not sure that is correct. Quantization charts suggest its a fairly continous spectrum. i.e. an aggressive quant 13B ends up about same as a no quant 7B:<p><a href="https://www.researchgate.net/figure/Performance-degradation-of-quantized-models-Chart-available-at_fig1_377817624" rel="nofollow">https://www.researchgate.net/figure/Performance-degradation-...</a>