First time I heard about Llama.cpp I got it to run on my computer. Now, my computer: a Dell laptop from 2013 with 8Gb RAM and an i5 processor, no dedicated graphic card. Since I wasn't using a MGLRU enabled kernel, It took a looong time to start but wasn't OOM-killed. Considering my amount of RAM was just the minimum required, I tried one of the smallest available models.<p>Impressively, it worked. It was slow to spit out tokens, at a rate around a word each 1 to 5 seconds and it was able to correctly answer "What was the biggest planet in the solar system", but it quickly hallucinated talking about moons that it called "Jupterians", while I expected it to talk about Galilean Moons.<p>Nevertheless, LLM's really impressed me and as soon as I get my hands on better hardware I'll try to run other bigger models locally in the hope that I'll finally have a personal "oracle" able to quickly answers most questions I throw at it and help me writing code and other fun things. Of course, I'll have to check its answers before using them, but current state seems impressive enough for me, specially QwQ.<p>Is Any one running smaller experiments and can talk about your results? Is it already possible to have something like an open source co-pilot running locally?