Shout out to everyone from Ollama and the wider community that helped with the reviews, feedback and assistance along the way. It's great to contribute to such a fantastic project.
Today I ran some perplexity benchmarks comparing F16 and Q8_0 for the K/V, I used Qwen 2.5 Coder 7b as I've heard people say things to the effect of Qwen being more sensitive to quantisation than some other models.<p>Well, it turns out there's barely any increase in perplexity at all - an increase of just 0.0043.<p>Added to the post: <a href="https://smcleod.net/2024/12/bringing-k/v-context-quantisation-to-ollama/#perplexity-measurements" rel="nofollow">https://smcleod.net/2024/12/bringing-k/v-context-quantisatio...</a>
What's the best way to use Ollama with a GUI, just OpenWebUI? Any options as well for mobile platforms like Android (or, I don't even know if we can run LLMs on the phone in the first place).
Nice.<p>That said... I mean...<p>> The journey to integrate K/V context cache quantisation into Ollama took around 5 months.<p>??<p>They incorrectly tagged #7926 which is a 2 line change, instead of #6279 where it was implemented, which made me dig a bit deeper and reading the actual change it seems:<p>The commit (1) is:<p><pre><code> > params := C.llama_context_default_params()
> ...
> params.type_k = kvCacheTypeFromStr(strings.ToLower(kvCacheType)) <--- adds this
> params.type_v = kvCacheTypeFromStr(strings.ToLower(kvCacheType)) <--- adds this
</code></pre>
Which has been part of llama.cpp since Dev 7, 2023 (2).<p>So... mmmm... while this is great, somehow I'm left feeling kind of vaguely put-off by the comms around what is really 'we finally support some config flag from llama.cpp that's been there for really <i>quite a long time</i>'.<p>> It took 5 months, but we got there in the end.<p>... I guess... yay? The challenges don't seem like they were technical, but I guess, good job getting it across the line in the end?<p>[1] - <a href="https://github.com/ollama/ollama/commit/1bdab9fdb19f8a8c73ed85291f9acea5bc1c7075#diff-7c8fcee9a6ef35252c34bdc9910b1e605c5d480ea80d9f2fe1c67dc069e9888cR144">https://github.com/ollama/ollama/commit/1bdab9fdb19f8a8c73ed...</a><p>[2] - since <a href="https://github.com/ggerganov/llama.cpp/commit/bcc0eb4591bec5ec02fad3f2bdcb1b265052ea56#diff-201cbc8fd17750764ed4a0862232e81503550c201995e16dc2e2766754eaa57aR907">https://github.com/ggerganov/llama.cpp/commit/bcc0eb4591bec5...</a>