Along with the ParetoQ paper from Meta (<a href="https://arxiv.org/abs/2502.02631" rel="nofollow">https://arxiv.org/abs/2502.02631</a>), the concept of low-bit LLMs seems to be gaining traction. Has anyone experimented with this in production? I'm aware of a few pre-transformer era companies focused on applying this to CNNs