TechEcho

1 comment

We are releasing new 2-bit Mixtral models. These ones use a mixed HQQ 4-bit/2-bit configuration, resulting in a significantly improved model (ppl 4.69 vs. 5.90) with a negligible 0.20 GB VRAM increase.<p>Base: <a href="https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-attn-4bit-moe-2bit-HQQ" rel="nofollow noreferrer">https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-v0.1-hf-a...</a><p>Instruct: <a href="https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-HQQ" rel="nofollow noreferrer">https://huggingface.co/mobiuslabsgmbh/Mixtral-8x7B-Instruct-...</a><p>Shout-out to Artem Eliseev and Denis Mazur for suggesting this idea ( <a href="https://github.com/mobiusml/hqq/issues/2">https://github.com/mobiusml/hqq/issues/2</a> )

New Mixtral HQQ Quantzied 4-bit/2-bit configuration

1 comment

New Mixtral HQQ Quantzied 4-bit/2-bit configuration

1 comment