TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Understanding Llama 2 and the New Code Llama LLMs

170 点作者 rasbt超过 1 年前

5 条评论

rwl4超过 1 年前
The author of the article appears to have misunderstood one important detail about Code Llama.<p>They state:<p><i>&gt; The Code Llama models were trained on 500B tokens, whereas Llama 2 models were trained on 2T tokens. Since the Code Llama model was trained on 4x fewer tokens, maybe a CodeLlama 70B version did not perform well enough due to LLM scaling laws—there was not enough training data.</i><p>But if you read the paper, on page 1, it says:<p><i>&gt; Our approach is based on gradually specializing and increasing the capabilities of Llama 2 models by applying a cascade of training and fine-tuning steps [...]</i><p>In fact, they show a diagram at the top of page 3 that details the process, starting with Llama 2 foundation models.<p>Llama 2 Foundation models (7B, 13B, 34B) -&gt; Code training 500B -&gt; Python &#x2F; Long Context.<p>See the paper here: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2308.12950" rel="nofollow noreferrer">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2308.12950</a>
评论 #37323858 未加载
评论 #37323179 未加载
评论 #37326992 未加载
评论 #37323648 未加载
ImprobableTruth超过 1 年前
&gt;GPT-3.5 has 175B parameters versus 70B parameters in Llama 2<p>We know that for the original version of GPT-3.5, but my assumption was that Turbo was a distilled smaller model (which is why it uses OAI&#x27;s new vocab &amp; is so much faster).<p>If that&#x27;s not the case, what could be the explanation for it being faster?
评论 #37322723 未加载
评论 #37321488 未加载
评论 #37322111 未加载
评论 #37324628 未加载
ranguna超过 1 年前
I&#x27;m not sure if I&#x27;m the only one, but I find the starcoder model to be muuuuch better than codellama 34B quantized. I can&#x27;t seem to find any good coding benchmarks online comparing the two.<p>Anyone else having a similar experience?
Havoc超过 1 年前
Managed to get code llama 34 integrated into vscode and must say it’s surprisingly usable for scaffolding and also explaining pieces of code
评论 #37327430 未加载
syntaxing超过 1 年前
Does this mean there’s most likely a non released version of llama 2 34B at Meta since they need one as a base for code llama?
评论 #37332056 未加载