The OP looks like good work, but it's definitely <i>not</i> a quick read. The authors claim theoretical breakthroughs that enable:<p>* a data-free LLM quantization method which they claim outperforms all prior data-free approaches, including NF4; and<p>* a method which they claim is optimal for finding non-uniform per-layer quantization levels which match a given compression constraint in the "medium bitwidth" regime.<p>They demonstrate improved accuracy-compression trade-offs on popular LLMs.<p>Thank you for sharing this on HN.