TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Run DeepSeek R1 Dynamic 1.58-bit

19 pointsby amrrs4 months ago

3 comments

danielhanchen4 months ago
Oh thanks for sharing this! The fork of llama.cpp for how to do the dynamic quant is here: <a href="https:&#x2F;&#x2F;github.com&#x2F;unslothai&#x2F;llama.cpp">https:&#x2F;&#x2F;github.com&#x2F;unslothai&#x2F;llama.cpp</a>. I also found min_p = 0.05 can help reduce chances of some bad tokens coming up for 1.58bit (I found it to happen around 1&#x2F;8000 tokens of the time)
homarp4 months ago
discussed here <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42850222">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=42850222</a>
homarp4 months ago
&quot;The 1.58bit quantization should fit in 160GB of VRAM for fast inference&quot;<p>instruction for llama.cpp: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;unsloth&#x2F;DeepSeek-R1-GGUF#instructions-to-run-this-model-in-llamacpp" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;unsloth&#x2F;DeepSeek-R1-GGUF#instructions...</a>