Yi-Coder: A Small but Mighty LLM for Code

285 点作者 crbelaus9 个月前

19 条评论

mythz9 个月前

Claude 3.5 Sonnet still holds the LLM crown for code which I'll use when wanting to check the output of the best LLM, however my Continue Dev, Aider and Claude Dev plugins are currently configured to use DeepSeek Coder V2 236B (and local ollama DeepSeek Coder V2 for tab completions) as it offers the best value at $0.14M/$0.28M which sits just below Claude 3.5 Sonnet on Aider's leaderboard [1] whilst being 43x cheaper.[1] <a href="https://aider.chat/docs/leaderboards/" rel="nofollow">https://aider.chat/docs/leaderboards/</a>

评论 #41454482 未加载

评论 #41455313 未加载

评论 #41456642 未加载

评论 #41464793 未加载

评论 #41465856 未加载

anotherpaulg9 个月前

Yi-Coder scored below GPT-3.5 on aider's code editing benchmark. GitHub user cheahjs recently submitted the results for the 9b model and a q4_0 version.Yi-Coder results, with Sonnet and GPT-3.5 for scale:<pre><code> 77% Sonnet 58% GPT-3.5 54% Yi-Coder-9b-Chat 45% Yi-Coder-9b-Chat-q4_0 </code></pre> Full leaderboard:<a href="https://aider.chat/docs/leaderboards/" rel="nofollow">https://aider.chat/docs/leaderboards/</a>

Palmik9 个月前

The difference between (A) software engineers reacting to AI models and systems for programming and (B) artists (whether it's painters, musicians or otherwise) reacting to AI models for generating images, music, etc. is very interesting.I wonder what's the reason.

评论 #41455230 未加载

评论 #41457335 未加载

评论 #41458682 未加载

评论 #41455290 未加载

评论 #41461461 未加载

评论 #41456290 未加载

评论 #41455316 未加载

theshrike799 个月前

> Continue pretrained on 2.4 Trillion high-quality tokens over 52 major programming languages.I'm still waiting for a model that's highly specialised for a single language only - and either a lot smaller than these jack of all trades ones or VERY good at that specific language's nuances + libraries.

评论 #41454307 未加载

评论 #41454443 未加载

评论 #41456721 未加载

评论 #41456389 未加载

评论 #41454298 未加载

评论 #41456102 未加载

评论 #41454943 未加载

评论 #41454294 未加载

评论 #41454480 未加载

JediPig9 个月前

I tested this out on my workload ( SRE/Devops/C#/Golang/C++ ). it started responding about non-sense on a simple write me boto python script that changes x ,y,z value.Then I tried other questions in my past to compare... However, I believe the engineer who did the LLM, just used the questions in benchmarks.One instance after a hour of use ( I stopped then ) it answered one question with 4 different programming languages, and answers that was no way related to the question.

评论 #41456681 未加载

评论 #41459004 未加载

mtrovo9 个月前

I'm new to this whole area and feeling a bit lost. How are people setting up these small LLMs like Yi-Coder locally for tab completion? Does it work natively on VSCode?Also for the cloud models apart from GitHub Copilot, what tools or steps are you all using to get them working on your projects? Any tips or resources would be super helpful!

评论 #41455135 未加载

评论 #41455218 未加载

smcleod9 个月前

Weird they're comparing it to really old deepseek v1 models, even v2 has been out a long time now.

评论 #41454279 未加载

评论 #41454723 未加载

kleiba9 个月前

What is the recommended hardware to run a model like that locally on a desktop PC?

评论 #41458197 未加载

NKosmatos9 个月前

It would be good if LLMs were somehow packaged in an easy way/format for us "novice" (ok I mean lazy) users to try them out.I'm not interested so much with the response time (anyone has a couple of spare A100s?), but it would be good to be able to try out different LLMs locally.

评论 #41455631 未加载

评论 #41455269 未加载

评论 #41455293 未加载

评论 #41455264 未加载

评论 #41455237 未加载

评论 #41455232 未加载

gloosx9 个月前

Can someone explain these Aider benchmarks to me? They pass same 113 tests through llm every time. Why they then extrapolate ability of llm to pass these 113 basic python challenges to the general ability to produce/edit code? For me it sounds like this or that model is 70% accurate in solving same hundred python training tasks, but why does it mean that it's good at other languages and arbitrary, private tasks as well? Does anyone ever tried to change them test cases or wiggle conditions a bit to see if it will still hit 70%?

评论 #41459020 未加载

smokel9 个月前

Does anyone know why the sizes of these models are typically expressed in number of weights (i.e 1.5B and 9B in this case), without mentioning the weight size in bytes?For practical reasons, I often like to know how much GPU RAM is required to run these models locally. The actual number of weights seems to only express some kind of relative power, which I doubt is relevant to most users.Edit: reformulated to sound like a genuine question instead of a complaint.

评论 #41459001 未加载

评论 #41459012 未加载

评论 #41458993 未加载

nathan_tarbert9 个月前

This sounds really cool! I found this Reddit discussion... <a href="https://www.reddit.com/r/ArtificialInteligence/comments/1f9mvyu/meet_yicoder_a_small_but_mighty_llm_for_code/" rel="nofollow">https://www.reddit.com/r/ArtificialInteligence/comments/1f9m...</a>

Tepix9 个月前

Sounds very promising!I hope that Yi-Coder 9B FP16 and Q8 will be available soon for Ollama, right now i only see the 4bit quantized 9B model.I'm assuming that these models will be quite a bit better than the 4bit model.

评论 #41456706 未加载

patrick-fitz9 个月前

I'd be interested to see how it performs on <a href="https://www.swebench.com/" rel="nofollow">https://www.swebench.com/</a>Using SWE-agent + Yi-Coder-9B-Chat.

cassianoleal9 个月前

Is there an LLM that's useful for Terraform? Something that understands HCL and has been trained on the providers, I imagine.

评论 #41458695 未加载

评论 #41456427 未加载

Havoc9 个月前

Beats deepseek 33. That’s impressive

评论 #41454588 未加载

lasermike0269 个月前

First look seem good. I'll keep hacking with it.

ziofill9 个月前

Are coding LLMs trained with the help of interpreters?

评论 #41454732 未加载

zeroq9 个月前

Everytime someone tells how AI 10x his programming capabilities I'm like "tell me you're bad at coding without telling me".

评论 #41469109 未加载

评论 #41459358 未加载