TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

XGen-7B, a new 7B foundational model trained on up to 8K length for 1.5T tokens

269 点作者 bratao将近 2 年前

8 条评论

brucethemoose2将近 2 年前
&gt; The training recipe and model architecture follow LLaMA<p>This is huge.<p>MPT and Falcon are cool, but the inference runtimes and various tooling is mostly optimized for LLaMA. If this is a drop-in replacement for 7B, it&#x27;s going to catch on much faster than any other small model.
评论 #36517221 未加载
评论 #36516030 未加载
TOMDM将近 2 年前
From all the experimentation I&#x27;ve done, 7B parameter models just don&#x27;t seem to be able to produce useful output reliably enough for my use cases.<p>What use cases do people have for these smaller LLM&#x27;s?
评论 #36515386 未加载
评论 #36515231 未加载
评论 #36515216 未加载
评论 #36515326 未加载
评论 #36515408 未加载
评论 #36515394 未加载
评论 #36516867 未加载
评论 #36515785 未加载
评论 #36515251 未加载
brucethemoose2将近 2 年前
Also, their metric table is very interesting. It shows Falcon 7B and OpenLlama 7B much less favorably than other evaluations (including the HuggingFace leaderboard, which I am kinda suspicious of), and instruct benchmarks like that aren&#x27;t seen as much.
profsummergig将近 2 年前
If someone could elucidate on what these phrases signify, I&#x27;d be very grateful:<p>1) 7B foundational model<p>2) 8K length<p>3) 1.5T tokens
评论 #36517243 未加载
评论 #36517678 未加载
评论 #36517524 未加载
评论 #36516897 未加载
评论 #36516863 未加载
DanAtC将近 2 年前
I have no idea what any of these words mean, but I&#x27;d like to. Can someone point me in the direction of an &quot;AI for Dipshits&quot;?
评论 #36515432 未加载
评论 #36515229 未加载
minimaxir将近 2 年前
Per the validation perplexity chart shown, the 8K length model performs better than the 4K length model even at &lt;4K length, so why are they even offering the 4K model if the 8K is strictly better?
评论 #36517815 未加载
artemonster将近 2 年前
Please recommend a good tutorial&#x2F;book&#x2F;video on modern LLMs and NNs in general, for programmers and technical people. Where you get the idea of how it works. Tried googling with dozens of queries and it just sucks, a lot of hand-wavy articles for lay people or some paid courses.
foolfoolz将近 2 年前
when will the llm race peak? have we peaked already?
评论 #36515207 未加载
评论 #36515495 未加载
评论 #36515488 未加载
评论 #36515261 未加载
评论 #36515188 未加载
评论 #36517526 未加载
评论 #36515566 未加载
评论 #36515761 未加载
评论 #36515238 未加载