We are releasing 2-bit and 4-bit quantized versions of Mixtral utilizing the HQQ method that we just published <a href="https://mobiusml.github.io/hqq_blog/" rel="nofollow noreferrer">https://mobiusml.github.io/hqq_blog/</a> and <a href="https://github.com/mobiusml/hqq">https://github.com/mobiusml/hqq</a>.<p>The 2-bit version can run on a 24GB Titan RTX.<p>In terms of perplexity scores on the wikitext2 dataset, the results are as follows:
Mixtral: 26GB / 3.79
Llama2-70B: 26.37GB / 4.13