I’m self hosting using TabbyML and running StarCoder 3B on my nvidia rtx2080 super and I can’t imagine coding without it anymore. It consistently, across all languages I work in , gives me great completions.<p>People not having good success in this thread, I would suggest trying again
Went through this same exercise this week and came to the same conclusion.<p>After trying multiple open models, reconfiguring GPT-4o and seeing the speed and quality of the output was illuminating.
I also wanted to try some local LLMs, but gave up and came to the same conclusion:<p>"While the idea of having a personal and private instance of a code assistant is interesting (and can also be the only available option in certain environments), the reality is that achieving the same level of performance as GitHub Copilot is quite challenging.".<p>But considering the pace at which AI and the ecosystem advances, things might change soon.
I believe we'll need a purpose-built ASIC with access to 100GB of good old [G]DDR5 before this becomes viable. Something like what Hailo offers, but without the "product inquiry" barrier.<p>I say that because we don't need datacenter speeds for a single user, but there is no avoiding memory requirements.<p>I don't think it will happen. The market is too niche. People are happy to fork over $5/mo.
It really depends on the use case, and right now using Ollama for coding just isn’t that useful. I can use gemma2 and phi3 just fine for general summarization and keyword extraction (including most of the stuff I need to do home automation with a “better Siri”—low bar, I know), but generating or autocompleting code is just another level entirely.