It keeps saying the phrase “model you can run locally”, but despite days of trying, I failed to compile any of the GitHub repos associated with these models.<p>None of the Python dependencies are strongly versioned, and “something” happened to the CUDA compatibility of one of them about a month ago. The original developers “got lucky” but now nobody else can compile this stuff.<p>After years of using only C# and Rust, both of which have sane package managers with semantic versioning, lock files, reproducible builds, and even SHA checksums the Python package ecosystem looks ridiculously immature and even childish.<p>Seriously, can anyone here build a docker image for running these models on CUDA? I think right now it’s borderline impossible, but I’d be happy to be corrected…
<p><pre><code> > Our system thinks you might be a robot!
We're really sorry about this, but it's getting harder and harder to tell the difference between humans and bots these days.
</code></pre>
Yeah, fuck you too. Come on, really, why put this in front of a _blog post_? Is it that hard to keep up with the bot requests when serving a static page?
Most places that recommend llama.cpp for mac fail to mention <a href="https://github.com/jankais3r/LLaMA_MPS">https://github.com/jankais3r/LLaMA_MPS</a>, which runs unquantized 7b and 13b models on the M1/M2 GPU directly. It's slightly slower, (not a lot), and significantly lower energy usage. To me the win not having to quantize while not melting a hole in my lap is huge; I wish more people knew about it.
I'm running Vicuna (a LLaMA variant) on my iPhone right now. <a href="https://twitter.com/simonw/status/1652358994214928384" rel="nofollow">https://twitter.com/simonw/status/1652358994214928384</a><p>The same team that built that iPhone app - MLC - also got Vicuna running directly in a web browser using Web GPU: <a href="https://simonwillison.net/2023/Apr/16/web-llm/" rel="nofollow">https://simonwillison.net/2023/Apr/16/web-llm/</a>
There is also CodyCapybara (7B finetuned on code competitions), the "uncensored" Vicuna, OpenAssistant 13B (which is said to be very good), various non English tunes, medalpaca... the release pace maddening.
I'll never understand why everyone is spending so much time on a model you cannot use commercially (at all).<p>Secondly, most of us can't even use the model for research or personal use, given the license.