I see a couple comments comparing llama.cpp and Ollama, and I think both have utility for different purposes. Having used both llama.cpp (which is fantastic) and Ollama, a couple things that I find valuable about Ollama out-of-the-box --<p>- Automatically loading/unloading models from memory - just running the Ollama server is a relatively small footprint; every time a particular model is called it is loaded into memory, and then unloaded after 5 mins of no further usage. It makes it very convenient to spin up different models for different use-cases without having to worry about memory management or manually shutting down those tools when not in use.<p>- OpenAI API compatibility - I run Ollama on a headless machine that has better hardware and connect via SSH port forwarding from my laptop, and with a 1 line change I can reroute any scripts on my laptop from GPT to Llama-3 (or anything else).<p>Overall, at least for tinkering with multiple local models and building small, personal tools, I've found the utility:maintenance ratio of Ollama to be very positive -- thanks to the team for building something so valuable! :)
I think this is a neat project, and use it a lot. My only complaint is the lack of grammar support. llama.cpp that they wrap will take a grammar. The dumbest patch to enable this is like two lines. And they seem to be willfully ignoring the (pretty trivial) feature for some reason. I'd rather not maintain a -but-with-grammars fork, so here we are.<p><a href="https://github.com/ollama/ollama/pull/4525#issuecomment-2157586947">https://github.com/ollama/ollama/pull/4525#issuecomment-2157...</a>
Big kudos to the ollama team, echoing others: It just works. Fiddled with llama.cpp for ages trying to get it to run on my GPU, and ollama was setup and done in literally 3 minutes. The memory management of model loading and unloading is great, and now I can hack around and play with different LLMs from a simple API. Highly recommend that folks try it out, I thought local LLMs would be a pain to setup and use, and ollama made it super easy.