TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Llama.vim – Local LLM-assisted text completion

530 pointsby kgwgk4 months ago

20 comments

ggerganov4 months ago
Hi HN, happy to see this here!<p>I highly recommend to take a look at the technical details of the server implementation that enables large context usage with this plugin - I think it is interesting and has some cool ideas [0].<p>Also, the same plugin is available for VS Code [1].<p>Let me know if you have any questions about the plugin - happy to explain. Btw, the performance has improved compared to what is seen in the README videos thanks to client-side caching.<p>[0] - <a href="https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;pull&#x2F;9787">https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;pull&#x2F;9787</a><p>[1] - <a href="https:&#x2F;&#x2F;github.com&#x2F;ggml-org&#x2F;llama.vscode">https:&#x2F;&#x2F;github.com&#x2F;ggml-org&#x2F;llama.vscode</a>
评论 #42806612 未加载
评论 #42807599 未加载
评论 #42812223 未加载
评论 #42806574 未加载
评论 #42806791 未加载
评论 #42808830 未加载
评论 #42806714 未加载
评论 #42807359 未加载
评论 #42807193 未加载
eigenvalue4 months ago
This guy is a national treasure and has contributed so much value to the open source AI ecosystem. I hope he’s able to attract enough funding to continue making software like this and releasing it as true “no strings attached” open source.
评论 #42808118 未加载
评论 #42808542 未加载
评论 #42807974 未加载
estreeper4 months ago
Very exciting - I&#x27;m a long-time vim user but most of my coworkers use VSCode, and I&#x27;ve been wanting to try out in-editor completion tools like this.<p>After using it for a couple hours (on Elixir code) with Qwen2.5-Coder-3B and no attempts to customize it, this checks a lot of boxes for me:<p><pre><code> - I pretty much want fancy autocomplete: filling in obvious things and saving my fingers the work, and these suggestions are pretty good - the default keybindings work for me, I like that I can keep current line or multi-line suggestions - no concerns around sending code off to a third-party - works offline when I&#x27;m traveling - it&#x27;s fast! </code></pre> So I don&#x27;t need to remember how to run the server, I&#x27;ll probably set up a script to check if it&#x27;s running and if not start it up in the background and run vim, and alias vim to use that. I looked in the help documents but didn&#x27;t see a way to disable the &quot;stats&quot; text after the suggestions, though I&#x27;m not sure it will bother me that much.
评论 #42811609 未加载
评论 #42814376 未加载
msoloviev4 months ago
I wonder how the &quot;ring context&quot; works under the hood. I have previously had (and recently messed around with again) a somewhat similar project designed for a more toy&#x2F;exploratory setting (<a href="https:&#x2F;&#x2F;github.com&#x2F;blackhole89&#x2F;autopen">https:&#x2F;&#x2F;github.com&#x2F;blackhole89&#x2F;autopen</a> - demo video at <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=1O1T2q2t7i4" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=1O1T2q2t7i4</a>), and one of the main problems to address definitively is the question of how to manage your KV cache cleverly so you don&#x27;t have to constantly perform too much expensive recomputation whenever the buffer undergoes local changes.<p>The solution I came up with involved maintaining a tree of tokens branching whenever an alternative next token was explored, with full LLM state snapshots at fixed depth intervals so that the buffer would only have to be &quot;replayed&quot; for a few tokens when something changed. I wonder if there are some mathematical properties of how the important parts of the state (really, the KV cache, which can be thought of as a partial precomputation of the operation that one LLM iteration performs on the context) work that could have made this more efficient, like to avoid saving full snapshots or perhaps to be able to prune the &quot;oldest&quot; tokens out of a state efficiently.<p>(edit: Georgi&#x27;s comment that beat me by 3 minutes appears to be pointing at information that would go some way to answer my questions!)
h14h4 months ago
A little bit of a tangent, but I&#x27;m really curious what benefits could come from integrating these LLM tools more closely with data from LSPs, compilers, and other static analysis tools.<p>Intuitively, it seems like you could provide much more context and better output as a result. Even better would be if you could fine-tune LLMs on a per-language basis and ship them alongside typical editor tooling.<p>A problem I see w&#x2F; these AI tools is that they work much better with old, popular languages, and I worry that this will grow as a significant factor when choosing a language. Anecdotally, I see far better results when using TypeScript than Gleam, for example.<p>It would be very cool to be able to install a Gleam-specific model that could be fed data from the LSP and compiler, and wouldn&#x27;t constantly hallucinate invalid syntax. I also wonder if, with additional context &amp; fine-tuning, you could make these models smaller and more feasible to run locally on modest hardware.
评论 #42809149 未加载
评论 #42809689 未加载
mijoharas4 months ago
Can anyone compare this to Tabbyml?[0] I just set that up yesterday for emacs to check it out.<p>The context gathering seems very interesting[1], and very vim-integrated, so I&#x27;m guessing there isn&#x27;t anything very similar for Tabby. I skimmed the docs and saw some stuff about context for the Tabby chat feature[2] which I&#x27;m not super interested in using even if the docs adding sounds nice, but nothing obvious for the auto completion[3].<p>Does anyone have more insight or info to compare the two?<p>As a note, I quite like that the LLM context here &quot;follows&quot; what you&#x27;re doing. It seems like a nice idea. Does anyone know if anyone else does something similar?<p>[0] <a href="https:&#x2F;&#x2F;www.tabbyml.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.tabbyml.com&#x2F;</a><p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;pull&#x2F;9787#issue-2572915687">https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp&#x2F;pull&#x2F;9787#issue-25729...</a> &quot;global context onwards&quot;<p>[2] <a href="https:&#x2F;&#x2F;tabby.tabbyml.com&#x2F;docs&#x2F;administration&#x2F;context&#x2F;" rel="nofollow">https:&#x2F;&#x2F;tabby.tabbyml.com&#x2F;docs&#x2F;administration&#x2F;context&#x2F;</a><p>[3] <a href="https:&#x2F;&#x2F;tabby.tabbyml.com&#x2F;docs&#x2F;administration&#x2F;code-completion&#x2F;" rel="nofollow">https:&#x2F;&#x2F;tabby.tabbyml.com&#x2F;docs&#x2F;administration&#x2F;code-completio...</a>
评论 #42811956 未加载
评论 #42811513 未加载
dingnuts4 months ago
Is anyone actually getting value out of these models? I wired one up to Emacs and the local models all produce a huge volume of garbage output.<p>Occasionally I find a hosted LLM useful but I haven&#x27;t found any output from the models I can run in Ollama on my gaming PC to be useful.<p>It&#x27;s all plausible-looking but incorrect. I feel like I&#x27;m taking crazy pills when I read about others&#x27; experiences. Surely I am not alone?
评论 #42806716 未加载
评论 #42807515 未加载
评论 #42808196 未加载
评论 #42807347 未加载
评论 #42824763 未加载
评论 #42807234 未加载
评论 #42809621 未加载
评论 #42808359 未加载
评论 #42808028 未加载
frankfrank134 months ago
Is this more or less the same as your VSCode version? (<a href="https:&#x2F;&#x2F;github.com&#x2F;ggml-org&#x2F;llama.vscode">https:&#x2F;&#x2F;github.com&#x2F;ggml-org&#x2F;llama.vscode</a>)
binary1324 months ago
I am curious to see what will be possible with consumer grade hardware and more improvements to quantization over the next decade. Right now, even a 24GB gpu with the best models isn’t able to match the barely acceptable performance of hosted services I’m not willing to even pay $20 a month for.
mohsen14 months ago
Terminal coding FTW!<p>And when you&#x27;re really stuck you can use DeepSeek R1 for a deeper analysis in your terminal using `askds`<p><a href="https:&#x2F;&#x2F;github.com&#x2F;bodo-run&#x2F;askds">https:&#x2F;&#x2F;github.com&#x2F;bodo-run&#x2F;askds</a>
opk4 months ago
Has anyone actually got this llama stuff to be usable on even moderate hardware? I find it just crashes because it doesn&#x27;t find enough RAM. I&#x27;ve got 2G of VRAM on an AMD graphics card and 16G of system RAM and that doesn&#x27;t seem to be enough. The impression I got from reading up was that it worked for most Apple stuff because the memory is unified and other than that, you need very expensive Nvidia GPUs with lots of VRAM. Are there any affordable options?
评论 #42808539 未加载
评论 #42808041 未加载
评论 #42808325 未加载
评论 #42808017 未加载
评论 #42809612 未加载
mrinterweb4 months ago
Been using this for a couple hours, and this is really nice. It is a great alternative to something like Github Copilot. Appreciate how simple and fast this is.
colordrops4 months ago
I&#x27;ve seen several posts and projects like this. Is there a summary&#x2F;comparison somewhere of the various ways of running local completion&#x2F;copilot?
jerpint4 months ago
It’s funny because I actually use vim mostly when I don’t want LLM assisted code. Sometimes it just gets in the way.<p>If I do, I load up cursor with vim bindings.
评论 #42809505 未加载
评论 #42807035 未加载
s-skl4 months ago
Really awesome work! Do anyone know what&#x27;s the tool&#x2F;terminal configuration he&#x27;s using on the video demo to embed CPU&#x2F;GPU usage on the terminal in that way ? Much appreciated :)
morcus4 months ago
Looking for advice from someone who knows about the space - Suppose I&#x27;m willing to go out and buy a card for this purpose, what&#x27;s a modestly priced graphics card with which I can get somewhat usable results running local LLM?
评论 #42813292 未加载
评论 #42810753 未加载
cfiggers4 months ago
Do people with &quot;Copilot+ PCs&quot; get benefits running stuff like this from the much-vaunted AI coprocessors in for e.g. Snapdragon X Elite chips?
awwaiid4 months ago
The blinking cursor in demo videos is giving me heart palpitations! But this is super cool. It makes me wonder how Linux is doing on M* hardware.
评论 #42821808 未加载
amelius4 months ago
This looks very interesting. Can this be trained on the user&#x27;s codebase, or is the idea that everything must fit inside the context buffer?
评论 #42816982 未加载
entelechy04 months ago
I use this on-and-off again. It is nice that I can flip between this and Copilot by commenting out one line in my init.lua