Works nearly out of the box with llama.cpp, which makes it easy to try locally: <a href="https://github.com/ggerganov/llama.cpp/issues/2766">https://github.com/ggerganov/llama.cpp/issues/2766</a><p>Here's some output from q4_0 quantization of CodeLlama-7b-Python (first four lines are the prompt):<p><pre><code> # prints the first ten prime numbers
def print_primes():
i = 2
num_printed = 0 # end of prompt
while num_printed < 10:
if is_prime(i):
print(i)
num_printed += 1
i += 1
def is_prime(n):
i = 2
while i * i <= n:
if n % i == 0:
return False
i += 1
return True
def main():
print_primes()
if __name__ == '__main__':
main()
</code></pre>
It will be interesting to see how the larger models perform, especially after community tuning and with better context/prompting.
The highlight IMO<p>> The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.<p>Edit: Reading the paper, key retrieval accuracy really deteriorates after 16k tokens, so it remains to be seen how useful the 100k context is.
Even the 7B model of code llama seems to be competitive with Codex, the model behind copilot<p><a href="https://ai.meta.com/blog/code-llama-large-language-model-coding/" rel="nofollow noreferrer">https://ai.meta.com/blog/code-llama-large-language-model-cod...</a>
Code llama Python is very interesting. Specifically tuned for Python.<p>I wonder if we could make such specific LLMs (one that is proficient in all things Rust, another- all things Linux, all things genomics, all things physics modeling etc) and have them talk to each other to collaboratively solve problems.<p>That would be a crazy future thing! Putting machines truly to work..
The best model, Unnatural Code Llama, is not released. Likely because it's trained on GPT4 based data, and might violate OpenAI TOS, because as per the "Unnatural" paper [1], the "unnatural" data is generated with the help of some LLM -- and you would want to use as good of an LLM as possible.<p>[1] <a href="https://arxiv.org/pdf/2212.09689.pdf" rel="nofollow noreferrer">https://arxiv.org/pdf/2212.09689.pdf</a>
TheBloke doesn’t joke around [1]. I’m guessing we’ll have the quantized ones by the end of the day. I’m super excited to use the 34B Python 4 bit quantized one that should just fit on a 3090.<p>[1] <a href="https://huggingface.co/TheBloke/CodeLlama-13B-Python-fp16" rel="nofollow noreferrer">https://huggingface.co/TheBloke/CodeLlama-13B-Python-fp16</a>
To run Code Llama locally, the 7B parameter quantized version can be downloaded and run with the open-source tool Ollama: <a href="https://github.com/jmorganca/ollama">https://github.com/jmorganca/ollama</a><p><pre><code> ollama run codellama "write a python function to add two numbers"
</code></pre>
More models coming soon (completion, python and more parameter counts)
>The Code Llama models provide stable generations with up to 100,000 tokens of context.<p>Not a bad context window, but makes me wonder how embedded code models would pick that context when dealing with a codebase larger than 100K tokens.<p>And this makes me further wonder if, when coding with such a tool (or at least a knowledge that they’re becoming more widely used and leaned on), are there some new considerations that we should be applying (or at least starting to think about) when programming? Perhaps having more or fewer comments, perhaps more terse and less readable code that would consume fewer tokens, perhaps different file structures, or even more deliberate naming conventions (like Hungarian notation but for code models) to facilitate searching or token pattern matching of some kind. Ultimately, in what ways could (or should) we adapt to make the most of these tools?
Copilot has been working great for me thus far, but it's limited by its interface. It seems like it only knows how to make predictions for the next bit of text.<p>Is anyone working on a code AI that can suggest refactorings?<p>"You should pull these lines into a function, it's repetitive"<p>"You should change this structure so it is easier to use"<p>Etc
As a complete noob at actually running these models, what kind of hardware are we talking here? Couldn't pick that up from the README.<p>I absolutely love the idea of using one of these models without having to upload my source code to a tech giant.
How are people using these local code models? I would much prefer using these in-context in an editor, but most of them seem to be deployed just in an instruction context. There's a lot of value to not having to context switch, or have a conversation.<p>I see the GitHub copilot extensions gets a new release one every few days, so is it just that the way they're integrated is more complicated so not worth the effort?
Interesting that there's a 34B model. That was missing from the original Llama 2 release. I wonder if it's still usable for general non-code chat tasks or if the code fine tuning destroyed that. It should be the best model that would still fit on 24GB gaming GPUs with quantization, because 70B doesn't fit.
Between this, ideogram.ai (image generator which can spell, from former Google Imagen team member and others), and ChatGPT fine-tuning, this has been a truly epic week.<p>I would argue that many teams will have to reevaluate their LLM strategy _again_ for the second time in a week.
How much am I’m missing out on with tools like this or code pilot, compared to using GPT-4?<p>I guess since Xcode doesn’t have a good plug-in architecture for this I began experimenting more with a chat interface.<p>So far gpt-4 has seemed quite useful for generating code, reviewing
code for certain problems, etc.
I can't wait for some models fine tuned on other languages. I'm not a Python developer, so I downloaded the 13B-instruct variant (4 bit quantized Q4_K_M) and it's pretty bad at doing javascript. I asked it to write me a basic React Native component that has a name prop and displays that name. Once it returned a regular React component, and when I asked it to make sure it uses React Native components, it said sure and outputted a bunch of random CSS and an HTML file that was initializing a React project.<p>It might be the quantization or my lacklustre prompting skills affecting it, though. To be fair I did get it to output a little bit of useful code after trying a few times.
Anyone know of a docker image that provides an HTTP API interface to Llama? I'm looking for a super simple sort of 'drop-in' solution like that which I can add to my web stack, to enable LLM in my web app.
This is great for asking questions like "how do I do x with y" and this code <<some code>> isn't working, whats wrong? Much faster that googling, or a great basis for forming a more accurate google search.<p>Where its a bit shit is when its used to provide auto suggest. It hallucinates plausible sounding functions/names, which for me personally are hard to stop if they are wrong (I suspect that's a function of the plugin)
Why wouldn’t they provide a hosted version? Seems like a no brainer… they have the money, the hardware, the bandwidth, the people to build support for it, and they could design the experience and gather more learning data about usage in the initial stages, while putting a dent in ChatGPT commercial prospects, and all while still letting others host and use it elsewhere. I don’t get it. Maybe it was just the fastest option?
What I found interesting in Meta's paper is the mention of HumanEval[1] and MBPP[2] as benchmarks for code quality. (Admittedly maybe they're well-known to those working in the field.)<p>I haven't yet read the whole paper (nor have I looked at the benchmark docs which might very well cover this) but curious how these are designed to avoid issues with overfitting. My thinking here is that canned algorithm type problems common in software engineering interviews are probably over represented in the training data used for these models. Which might point to artificially better performance by LLMs versus their performance on more domain-specific type tasks they might be used for in day-to-day work.<p>[1] <a href="https://github.com/openai/human-eval">https://github.com/openai/human-eval</a><p>[2] <a href="https://github.com/google-research/google-research/tree/master/mbpp">https://github.com/google-research/google-research/tree/mast...</a>
It's really sad how everyone here is fawning over tech that will destroy you own livelihoods. "AI won't take your job, those who use AI will" is purely short term, myopic thinking. These tools are not aimed to help workers, the end goal is to make it so you don't need to be an engineer to build software, just let the project manager or director describe the system they want and boom there it is.<p>You can scream that this is progress all you want, and I'll grant you that these tools will greatly speed up the generation of code. But more code won't make any of these businesses provide better services to people, lower their prices, or pay workers more. They are just a means to keep money from flowing out of the hands of the C-Suite and investor classes.<p>If software engineering becomes a solved problem then fine, we probably shouldn't continue to get paid huge salaries to write it anymore, but please stop acting like this is a better future for any of us normal folks.
Curious if there are projects to enable working with these things self-hosted, tuned to a git repo as context on the cli, like a Unix filter - or with editors like vim? (I'd love to use this with Helix)<p>I see both vscode and netbeans have a concept of "inference URL" - are there any efforts like language server (lsp) - but for inference?
I want "safety" to be opt-in due to the inaccuracy it introduces. I don't want to pay that tax just because someone is afraid I can ask it how to make a bomb when I can just Google that and get pretty close to the same answer already, and I certainly don't care about being offended by its answers.
If you want to try out Code Llama, you can query it on Anyscale Endpoints (this is an LLM inference API we're working on here at Anyscale).<p><a href="https://app.endpoints.anyscale.com/" rel="nofollow noreferrer">https://app.endpoints.anyscale.com/</a>
Here is the paper:<p><a href="https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/" rel="nofollow noreferrer">https://ai.meta.com/research/publications/code-llama-open-fo...</a>
Feels like we're like a year away from local LLMs that can debug code reliably (via being hooked into console error output as well) which will be quite the exciting day.
The 34b Python model is quite close to GPT4 on HumanEval pass@1. Small specialised models are catching up to GPT4 slowly. Why not train a 70b model though?
Given this can produce code when prompted, could it also be used to interpret html from a crawler and then be used to scrape arbitrary URLs and extract structured attributes? Basically like MarkupLM but with massively more token context?
This is probably a stupid question, but would it be possible to use these models to rate existing code and point to possible problems, rather than generating new code? That would be extremely useful to some use cases I'm working on.
Random tangential question given this is about llama, but how do you get llama.cpp or kobold (or whatever tool you use) to make use of multiple GPUs if you don't have NVlink in place?<p>I got a bridge, but it was the wrong size.<p>Thanks, in advance.
Is there somewhere docs to show you how to run this on your local machine and can you make it port it a script between languages? Gpt4 can do that pretty well but its context is too small for advanced purposes.
this is cool, <a href="https://labs.perplexity.ai/" rel="nofollow noreferrer">https://labs.perplexity.ai/</a> has been my favorite way to play w these models so far
Those charts remind me just how insanely good GPT-4 is. It's almost 5 months since its release and I am still at awe with its capabilities. The way it helps with coding is just crazy.
it looks like <a href="https://news.ycombinator.com/item?id=37248844">https://news.ycombinator.com/item?id=37248844</a> has gotten the traction at 295 points
Can someone point me to a ELI5 sequence of steps that shows how someone can install and use LLMs locally and in some way, functionally?<p>Asking for purposes of educating non-technologists.
34B is grouped query attention, right? Does that make it the smallest model with grouped attention?<p>I can see some people fine-tuning it again for general propose instruct.
Llama is a very cool language model, it being used for coding was all but inevitable. I especially love it being released open for everyone.<p>I do wonder about how much use it'll get, seeing as running a heavy language model on local hardware is kinda unlikely for most developers. Not everyone is runnning a system powerful enough to equip big AIs like this. I also doubt that companies are going to set up large AIs for their devs. It's just a weird positioning.
Business opportunity: I'd pay money for NICE desktop software that can run all these different models (non-subscription, "2-year updates included, then discount pricing" modal perhaps). My wishlist:<p>- Easy plug & play model installation, and trivial to change which model once installed.<p>- Runs a local web server, so I can interact with it via any browser<p>- Ability to feed a model a document or multiple documents and be able to ask questions about them (or build a database of some kind?).<p>- Absolute privacy guarantees. Nothing goes off-machine from my prompt/responses (USP over existing cloud/online ones). Routine license/update checks are fine though.<p>I'm not trying to throw shade at the existing ways to running LLMs locally, just saying there may be room for an OPTIONAL commercial piece of software in this space. Most of them are designed for academics to do academic things. I am talking about a turn-key piece of software for everyone else that can give you an "almost" ChatGPT or "almost" CoPilot-like experience for a one time fee that you can feed sensitive private information to.
Does anyone have a good explanation for Meta's strategy with AI?<p>The only thing I've been able to think is they're trying to commoditize this new category before Microsoft and Google can lock it in, but where to from there? Is it just to block the others from a new revenue source, or do they have a longer game they're playing?
Amazing! It's great that Meta is making AI progress.<p>In the meantime, we are still waiting for Google to show what they have (according to their research papers, they are beating others).<p>> User: Write a loop in Python that displays the top 10 prime numbers.<p>> Bard: Sorry I am just an AI, I can't help you with coding.<p>> User: How to ask confirmation before deleting a file ?<p>> Bard: To ask confirmation before deleting a file, just add -f to the rm command.<p>(real cases)