TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Bringing K/V context quantisation to Ollama

220 pointsby mchiang6 months ago

5 comments

smcleod6 months ago
Shout out to everyone from Ollama and the wider community that helped with the reviews, feedback and assistance along the way. It's great to contribute to such a fantastic project.
评论 #42328147 未加载
smcleod6 months ago
Today I ran some perplexity benchmarks comparing F16 and Q8_0 for the K&#x2F;V, I used Qwen 2.5 Coder 7b as I&#x27;ve heard people say things to the effect of Qwen being more sensitive to quantisation than some other models.<p>Well, it turns out there&#x27;s barely any increase in perplexity at all - an increase of just 0.0043.<p>Added to the post: <a href="https:&#x2F;&#x2F;smcleod.net&#x2F;2024&#x2F;12&#x2F;bringing-k&#x2F;v-context-quantisation-to-ollama&#x2F;#perplexity-measurements" rel="nofollow">https:&#x2F;&#x2F;smcleod.net&#x2F;2024&#x2F;12&#x2F;bringing-k&#x2F;v-context-quantisatio...</a>
satvikpendem6 months ago
What&#x27;s the best way to use Ollama with a GUI, just OpenWebUI? Any options as well for mobile platforms like Android (or, I don&#x27;t even know if we can run LLMs on the phone in the first place).
评论 #42324391 未加载
评论 #42325371 未加载
评论 #42324186 未加载
评论 #42325523 未加载
评论 #42324210 未加载
评论 #42326237 未加载
评论 #42325499 未加载
评论 #42329336 未加载
评论 #42330393 未加载
评论 #42325328 未加载
评论 #42328666 未加载
lastdong6 months ago
Great project! Do you think there might be some advantages to bringing this over to LLaMA-BitNet?
wokwokwok6 months ago
Nice.<p>That said... I mean...<p>&gt; The journey to integrate K&#x2F;V context cache quantisation into Ollama took around 5 months.<p>??<p>They incorrectly tagged #7926 which is a 2 line change, instead of #6279 where it was implemented, which made me dig a bit deeper and reading the actual change it seems:<p>The commit (1) is:<p><pre><code> &gt; params := C.llama_context_default_params() &gt; ... &gt; params.type_k = kvCacheTypeFromStr(strings.ToLower(kvCacheType)) &lt;--- adds this &gt; params.type_v = kvCacheTypeFromStr(strings.ToLower(kvCacheType)) &lt;--- adds this </code></pre> Which has been part of llama.cpp since Dev 7, 2023 (2).<p>So... mmmm... while this is great, somehow I&#x27;m left feeling kind of vaguely put-off by the comms around what is really &#x27;we finally support some config flag from llama.cpp that&#x27;s been there for really <i>quite a long time</i>&#x27;.<p>&gt; It took 5 months, but we got there in the end.<p>... I guess... yay? The challenges don&#x27;t seem like they were technical, but I guess, good job getting it across the line in the end?<p>[1] - <a href="https:&#x2F;&#x2F;github.com&#x2F;ollama&#x2F;ollama&#x2F;commit&#x2F;1bdab9fdb19f8a8c73ed85291f9acea5bc1c7075#diff-7c8fcee9a6ef35252c34bdc9910b1e605c5d480ea80d9f2fe1c67dc069e9888cR144">https:&#x2F;&#x2F;github.com&#x2F;ollama&#x2F;ollama&#x2F;commit&#x2F;1bdab9fdb19f8a8c73ed...</a><p>[2] - since <a href="https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;commit&#x2F;bcc0eb4591bec5ec02fad3f2bdcb1b265052ea56#diff-201cbc8fd17750764ed4a0862232e81503550c201995e16dc2e2766754eaa57aR907">https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;commit&#x2F;bcc0eb4591bec5...</a>
评论 #42324892 未加载
评论 #42326214 未加载
评论 #42325134 未加载
评论 #42329518 未加载
评论 #42324957 未加载
评论 #42325096 未加载