TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

OLMo: Accelerating the Science of Language Models [pdf]

141 点作者 chuckhend超过 1 年前

8 条评论

bravura超过 1 年前
&quot;We intend to follow up on this release with another one soon that includes the following:<p>...<p>Weights &amp; Biases logs for our training runs.&quot;<p>That&#x27;s amazing. I&#x27;ve never seen that before in a paper of this quality. Or, any paper at all.
评论 #39266706 未加载
评论 #39227341 未加载
评论 #39233007 未加载
nl超过 1 年前
It&#x27;s very interesting that they went to the effort of doing complete end-to-end runs on both NVidia and AMD hardware.<p>A pity they didn&#x27;t release the speed of training, but the software is now there for someone else (not under benchmark embargo) to do that.
alchemist1e9超过 1 年前
They detail the energy used and therefore estimated carbon emissions which is interesting. When I estimate the raw electricity cost using 7-20 cents per kWh for US commercial rates, then we are only talking about $16-50k for electricity, that seems pretty small! Is my math wrong?<p>Is there any information on how much the computing costs were for renting the clusters?<p>Is the barrier to entry for a 7B model only a couple $100K?<p>EDIT: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39223467#39224534">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39223467#39224534</a><p>Perhaps only $85K total
评论 #39230712 未加载
评论 #39224808 未加载
评论 #39224958 未加载
gardenfelder超过 1 年前
<a href="https:&#x2F;&#x2F;huggingface.co&#x2F;allenai&#x2F;OLMo-7B" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;allenai&#x2F;OLMo-7B</a><p>Edit: add <a href="https:&#x2F;&#x2F;github.com&#x2F;allenai&#x2F;OLMo">https:&#x2F;&#x2F;github.com&#x2F;allenai&#x2F;OLMo</a>
nl超过 1 年前
Who will be the first to do a useful Instruct-trained variant?<p>It&#x27;s a pity the Mistral 7B Instruct 0.2 dataset isn&#x27;t available because I&#x27;ve found that a much higher quality than any of the finetunes around, and I suspect we&#x27;ll have to rely on the same groups doing finetunes for this.
评论 #39225143 未加载
casercaramel144超过 1 年前
I&#x27;m sorry, I don&#x27;t understand the exact contribution here? There&#x27;s many tutorials on how to train a language model. If it&#x27;s a repository of SOTA techniques for training, this will be outdated in at max 3 months, and anyways the ground shifts under you in this field so you might as well read Arxiv all day if your intention is to keep up with SOTA.
评论 #39228474 未加载
评论 #39227737 未加载
jerrygenser超过 1 年前
Pretty cool that it runs on and and Nvidia
评论 #39224776 未加载
artninja1988超过 1 年前
Feels like there must be 40 or so distinct open source llms now. What gives? We need some more new text to image models too... :(
评论 #39224534 未加载
评论 #39223915 未加载
评论 #39225152 未加载
评论 #39223843 未加载
评论 #39223967 未加载