OpenCoder: Open Cookbook for Top-Tier Code Large Language Models

619 点作者 pil0u6 个月前

19 条评论

> Unlike most prior efforts, we release not only model weights and inference code, but also the reproducible training data, complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols for open scientific research.Regardless of the specific performance of this model versus another model, I think it’s good to keep in mind that everyone benefits from this kind of work

评论 #42098558 未加载

评论 #42103725 未加载

lakomen6 个月前

I wonder if I'll ever give a damn about AI LLM and the like. Maybe that's the generational barrier. I find that topic so uninteresting, like cars or kitchen apparatuses, or home electronics of the 70s. I see the technology as energy burning or rather wasting and not beneficial to human evolution and survival, the contrary.It's not because of the unknown, it will also replace me and remove the joy of building something on my own. But I bet in 20 30 years it will be like DJing back in the 90s and DJing now. DJing back then was manual work and art, required skill. DJing now is mostly effortless and could even be automated, with AI too. It's more of a performance show than mixing skill and art.Creating something new will be a matter of just defining what you'd like your result to be (as already is the case very often) and refining the steps. Instead of writing code, you'll be writing or speaking with an AI, which will then generate code.When I started coding at the age of 11, that was the dream. But I still can't find the motivation to deal with AI.I'm 49 now, soon 50.

评论 #42100768 未加载

评论 #42101130 未加载

评论 #42102210 未加载

评论 #42100288 未加载

johndough6 个月前

I was wondering why Figure 1 showed a HumanEval score of 61.6 for Qwen2.5-Coder-7B, but Table 1 shows a score of 88.4, i. e. better than this new model with a score of 66.5.The reason is that those are actually two different models (Qwen2.5-Coder-7B-Base with 61.6, Qwen2.5-Coder-7B-Instruct with 88.4).

sysmax6 个月前

I was just messing around with LLMs all day, so had a few test cases open. Asked it to change a few things in a ~6KB C# snippet in a somewhat ambiguous, but reasonable way.GPT-4 did this job perfectly. Qwen:72b did half of the job, completely missed the other one, and renamed 1 variable that had nothing to do with the question. Llama3.1:70b behaved very similar to Qwen, which is interesting.OpenCoder:8b started reasonably well, then randomly replaced "Split('\n')" with "Split(n)" in unrelated code, and then went completely berserk, hallucinating non-existent StackOverflow pages and answers.For posterity, I saved it here: <a href="https://pastebin.com/VRXYFpzr" rel="nofollow">https://pastebin.com/VRXYFpzr</a>My best guess is that you shouldn't train it on mostly code. Natural language conversations used to train other models let them "figure out" human-like reasoning. If your training set is mostly code, it can produce output that looks like code, but it will have little value to humans.Edit: to be fair, llama3.2:3b also botched the code. But it did not hallucinate complete nonsense at least.

评论 #42099898 未加载

评论 #42099409 未加载

评论 #42100179 未加载

评论 #42099326 未加载

评论 #42099582 未加载

smilebot6 个月前

>Due to the prevalence of forking and copy-pasting within the codebase, nearly 75% of files are completely duplicated.This is surprisingly high. Does the include imported libraries and packages? Since you are hashing at the file level, I am not fully convinced that this is due to people copying entire files over without modification.

评论 #42097151 未加载

评论 #42099245 未加载

atilimcetin6 个月前

Home page of that arxiv paper: <a href="https://opencoder-llm.github.io/" rel="nofollow">https://opencoder-llm.github.io/</a>

评论 #42096816 未加载

tontoncyber6 个月前

Interesting paper and work but the model doesn't seems to be better than Qwen2.5-Coder in some languages including Ruby.

评论 #42096295 未加载

评论 #42096469 未加载

IshKebab6 个月前

What kind of hardware do you need to run this?

rustcleaner6 个月前

Anyone doing training where the metadata of the compilation and execution (like profiling data) is included? Maybe such inclusion could help nudge models in more code-efficient directions? I don't know, I'm only a laygenius at this stuff.

fabmilo6 个月前

I really would like to work full time on LLM for code generation. I have many ideas on how to leverage the context length to produce way better output than current models. My current setup is Zed editor + ollama + qwen-2.5-coder on an M3 Ultra. Will try this model too but the context length of 8K is already a limiting factor. DM me if you have similar interests.

DeathArrow6 个月前

It seems that most top tier open LLM models are coming from China. Even though China has restrictions for hardware used for training.

评论 #42103873 未加载

评论 #42099888 未加载

styczen6 个月前

remote_code raise ValueError( ValueError: Loading infly/OpenCoder-1.5B-Base requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.

chvid6 个月前

Great to see some actual open source!Looking at the training data. For example a prompt / response like this:Prompt:I want to create a Python program that takes a sentence as input and sorts all the words in the sentence in descending order based on their length. How can I achieve this?Response:Certainly! You can create a Python program to sort the words in a sentence by their length in descending order. Here’s a step-by-step guide to achieve this: 1. *Split the sentence into words*: Use the `split()` method to break the sentence into a list of words. 2. *Sort the words by length*: Use the `sorted()` function with a custom key to sort the words by their length in descending order. 3. *Join the sorted words*: Use the `join()` method to combine the sorted words back into a single string. Here's a complete Python program that demonstrates this: ```python def sort_words_by_length(sentence): # Step 1: Split the sentence into words words = sentence.split() # Step 2: Sort the words by length in descending order sorted_words = sorted(words, key=len, reverse=True) # Step 3: Join the sorted words back into a single string sorted_sentence = ' '.join(sorted_words) return sorted_sentence # Example usage sentence = "I want to create a Python program" sorted_sentence = sort_words_by_length(sentence) print(sorted_sentence) ``` ### Explanation: - *`sentence.split()`*: This method splits the input sentence into a list of words based on whitespace. - *`sorted(words, key=len, reverse=True)`*: This sorts the list of words based on their length. The `key=len` argument specifies that the length of each word should be used for sorting, and `reverse=True` ensures the sorting is in descending order. - *`' '.join(sorted_words)`*: This joins the sorted list of words back into a single string, with each word separated by a space. ### Example Output: For the input sentence `"I want to create a Python program"`, the output will be: ``` create program Python want ``` This output shows the words sorted by length in descending order.It strikes me that it would easier to train a NN (or another mechanism) if the training that was more structured and the actual question/answer was stripped from the plesentaries / additional explanations.Also keep the training data to one language (say english / python).Ie.:Prompt:sort the words in a sentence by their length in descending orderResponse:sorted(sentence.split(), key=len, reverse=True)Alternative one could use snippets like above and the synthesize "realistic" prompt / responses.

4b11b46 个月前

plumbing is important

hasnain996 个月前

nice

v3ss0n6 个月前

Tested , so much hallucination , cannot hold a candle against Qwen 2.5 or even General Purpose model Mistral-Nemo.

评论 #42098017 未加载

评论 #42096866 未加载

评论 #42098817 未加载

TZubiri6 个月前

What is that "this http URL" thing in the first sentence of the abstract?Is this slob?

评论 #42095890 未加载

评论 #42095889 未加载

评论 #42095920 未加载

mistrial96 个月前

making a wild guess on the nationality of every author of this paper (1), and observing the number of authors, and observing the velocity and volume of similar papers.. it seems a pattern of "English language as a service to automated programming environments" appears to be very useful and relevant for people (nations?) that are wholly and firmly not English speaking..(1) is M-A-P or INFtech dot ai a well-known institutional affiliation?

评论 #42096260 未加载

评论 #42096245 未加载

评论 #42096551 未加载

评论 #42096130 未加载

telcal6 个月前

Honestly thought from the title that this was some kind of food recipe cookbook using LLMs.