TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Understanding Llama 2 and the New Code Llama LLMs

170 pointsby rasbtover 1 year ago

5 comments

rwl4over 1 year ago
The author of the article appears to have misunderstood one important detail about Code Llama.<p>They state:<p><i>&gt; The Code Llama models were trained on 500B tokens, whereas Llama 2 models were trained on 2T tokens. Since the Code Llama model was trained on 4x fewer tokens, maybe a CodeLlama 70B version did not perform well enough due to LLM scaling laws—there was not enough training data.</i><p>But if you read the paper, on page 1, it says:<p><i>&gt; Our approach is based on gradually specializing and increasing the capabilities of Llama 2 models by applying a cascade of training and fine-tuning steps [...]</i><p>In fact, they show a diagram at the top of page 3 that details the process, starting with Llama 2 foundation models.<p>Llama 2 Foundation models (7B, 13B, 34B) -&gt; Code training 500B -&gt; Python &#x2F; Long Context.<p>See the paper here: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2308.12950" rel="nofollow noreferrer">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2308.12950</a>
评论 #37323858 未加载
评论 #37323179 未加载
评论 #37326992 未加载
评论 #37323648 未加载
ImprobableTruthover 1 year ago
&gt;GPT-3.5 has 175B parameters versus 70B parameters in Llama 2<p>We know that for the original version of GPT-3.5, but my assumption was that Turbo was a distilled smaller model (which is why it uses OAI&#x27;s new vocab &amp; is so much faster).<p>If that&#x27;s not the case, what could be the explanation for it being faster?
评论 #37322723 未加载
评论 #37321488 未加载
评论 #37322111 未加载
评论 #37324628 未加载
rangunaover 1 year ago
I&#x27;m not sure if I&#x27;m the only one, but I find the starcoder model to be muuuuch better than codellama 34B quantized. I can&#x27;t seem to find any good coding benchmarks online comparing the two.<p>Anyone else having a similar experience?
Havocover 1 year ago
Managed to get code llama 34 integrated into vscode and must say it’s surprisingly usable for scaffolding and also explaining pieces of code
评论 #37327430 未加载
syntaxingover 1 year ago
Does this mean there’s most likely a non released version of llama 2 34B at Meta since they need one as a base for code llama?
评论 #37332056 未加载