TechEcho

5 comments

smcleod6 months ago

Shout out to everyone from Ollama and the wider community that helped with the reviews, feedback and assistance along the way. It's great to contribute to such a fantastic project.

评论 #42328147 未加载

smcleod6 months ago

Today I ran some perplexity benchmarks comparing F16 and Q8_0 for the K/V, I used Qwen 2.5 Coder 7b as I've heard people say things to the effect of Qwen being more sensitive to quantisation than some other models.Well, it turns out there's barely any increase in perplexity at all - an increase of just 0.0043.Added to the post: <a href="https://smcleod.net/2024/12/bringing-k/v-context-quantisation-to-ollama/#perplexity-measurements" rel="nofollow">https://smcleod.net/2024/12/bringing-k/v-context-quantisatio...</a>

satvikpendem6 months ago

What's the best way to use Ollama with a GUI, just OpenWebUI? Any options as well for mobile platforms like Android (or, I don't even know if we can run LLMs on the phone in the first place).

评论 #42324391 未加载

评论 #42325371 未加载

评论 #42324186 未加载

评论 #42325523 未加载

评论 #42324210 未加载

评论 #42326237 未加载

评论 #42325499 未加载

评论 #42329336 未加载

评论 #42330393 未加载

评论 #42325328 未加载

评论 #42328666 未加载

lastdong6 months ago

Great project! Do you think there might be some advantages to bringing this over to LLaMA-BitNet?

wokwokwok6 months ago

Nice.That said... I mean...> The journey to integrate K/V context cache quantisation into Ollama took around 5 months.??They incorrectly tagged #7926 which is a 2 line change, instead of #6279 where it was implemented, which made me dig a bit deeper and reading the actual change it seems:The commit (1) is:<pre><code> > params := C.llama_context_default_params() > ... > params.type_k = kvCacheTypeFromStr(strings.ToLower(kvCacheType)) <--- adds this > params.type_v = kvCacheTypeFromStr(strings.ToLower(kvCacheType)) <--- adds this </code></pre> Which has been part of llama.cpp since Dev 7, 2023 (2).So... mmmm... while this is great, somehow I'm left feeling kind of vaguely put-off by the comms around what is really 'we finally support some config flag from llama.cpp that's been there for really quite a long time'.> It took 5 months, but we got there in the end.... I guess... yay? The challenges don't seem like they were technical, but I guess, good job getting it across the line in the end?[1] - <a href="https://github.com/ollama/ollama/commit/1bdab9fdb19f8a8c73ed85291f9acea5bc1c7075#diff-7c8fcee9a6ef35252c34bdc9910b1e605c5d480ea80d9f2fe1c67dc069e9888cR144">https://github.com/ollama/ollama/commit/1bdab9fdb19f8a8c73ed...</a>[2] - since <a href="https://github.com/ggerganov/llama.cpp/commit/bcc0eb4591bec5ec02fad3f2bdcb1b265052ea56#diff-201cbc8fd17750764ed4a0862232e81503550c201995e16dc2e2766754eaa57aR907">https://github.com/ggerganov/llama.cpp/commit/bcc0eb4591bec5...</a>

评论 #42324892 未加载

评论 #42326214 未加载

评论 #42325134 未加载

评论 #42329518 未加载

评论 #42324957 未加载

评论 #42325096 未加载

5 comments

smcleod6 months ago

Shout out to everyone from Ollama and the wider community that helped with the reviews, feedback and assistance along the way. It's great to contribute to such a fantastic project.

评论 #42328147 未加载

smcleod6 months ago

satvikpendem6 months ago

What's the best way to use Ollama with a GUI, just OpenWebUI? Any options as well for mobile platforms like Android (or, I don't even know if we can run LLMs on the phone in the first place).

评论 #42324391 未加载

评论 #42325371 未加载

评论 #42324186 未加载

评论 #42325523 未加载

评论 #42324210 未加载

评论 #42326237 未加载

评论 #42325499 未加载

评论 #42329336 未加载

评论 #42330393 未加载

评论 #42325328 未加载

评论 #42328666 未加载

lastdong6 months ago

Great project! Do you think there might be some advantages to bringing this over to LLaMA-BitNet?

Bringing K/V context quantisation to Ollama

5 comments

Bringing K/V context quantisation to Ollama

5 comments