TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

2-bit and 4-bit versions of Mixtral

4 pointsby ibuildthingsover 1 year ago

1 comment

ibuildthingsover 1 year ago
We are releasing 2-bit and 4-bit quantized versions of Mixtral utilizing the HQQ method that we just published <a href="https:&#x2F;&#x2F;mobiusml.github.io&#x2F;hqq_blog&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;mobiusml.github.io&#x2F;hqq_blog&#x2F;</a> and <a href="https:&#x2F;&#x2F;github.com&#x2F;mobiusml&#x2F;hqq">https:&#x2F;&#x2F;github.com&#x2F;mobiusml&#x2F;hqq</a>.<p>The 2-bit version can run on a 24GB Titan RTX.<p>In terms of perplexity scores on the wikitext2 dataset, the results are as follows: Mixtral: 26GB &#x2F; 3.79 Llama2-70B: 26.37GB &#x2F; 4.13