My colleagues and I wrote a paper and integrated it into transformers.<p>It has more of both accuracy and speed than NF4<p>We have compressed hf models for everyone to try: <a href="https://huggingface.co/collections/ISTA-DASLab/higgs-675308e432fd56b7f6dab94e" rel="nofollow">https://huggingface.co/collections/ISTA-DASLab/higgs-675308e...</a>