TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

TinyLlama: An Open-Source Small Language Model

143 点作者 matt1超过 1 年前

10 条评论

minimaxir超过 1 年前
It was fun to follow the public TinyLlama loss curves in near real-time, although it showed that it can be frustrating since the loss curves barely moved down even after an extra trillion tokens: <a href="https:&#x2F;&#x2F;wandb.ai&#x2F;lance777&#x2F;lightning_logs&#x2F;reports&#x2F;metric-train_loss-23-09-04-23-38-15---Vmlldzo1MzA4MzIw?accessToken=5eu2sndit2mo6eqls8h38sklcgfwt660ek1f2czlgtqjv2c6tida47qm1oty8ik9" rel="nofollow">https:&#x2F;&#x2F;wandb.ai&#x2F;lance777&#x2F;lightning_logs&#x2F;reports&#x2F;metric-trai...</a> (note the log-scaled X-axis)<p>But they <i>did</i> move down and that&#x27;s what&#x27;s important.<p>There should probably be more aggressive learning rate annealing for models trying to be Chinchilla-optimal instead of just cosine-with-warmup like every other model nowadays.
评论 #38886685 未加载
评论 #38886504 未加载
TheCoreh超过 1 年前
From the GitHub repo Readme:<p>&gt; we can achieve this within a span of &quot;just&quot; 90 days using 16 A100-40G GPUs<p>I knew the computational power required to train LLMs was absurd, but seeing the figures of larger networks (which are just too large to intuitively understand) it didn&#x27;t really register. With this one I could actually imagine the 16 machines with A100 GPUs sitting on a server room running at full blast for 90 days so it was more tangible... And now to think about the larger ones is kinda scary<p>Edit: Did the math and just the GPUs (at 250W each) consumed around 8.64 MWh, which is at the same ballpark of the power consumption of the average US home in one year (10.5MWh)
评论 #38890476 未加载
andy99超过 1 年前
I&#x27;ve been using one of the earlier checkpoints for benchmarking a Llama implementation. Completely anecdotally I feel at least as good or better about this one than the earlier openllama 3B. I wouldn&#x27;t use either of them for RAG or anything requiring more power, just to say that it&#x27;s competitive as a smaller model, whatever you use those for, and easy to run on CPU at FP16 (meaning without serious quantization).
评论 #38886352 未加载
评论 #38885929 未加载
评论 #38885498 未加载
评论 #38885685 未加载
dmezzetti超过 1 年前
Link to model on HF Hub: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;TinyLlama&#x2F;TinyLlama-1.1B-Chat-v1.0" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;TinyLlama&#x2F;TinyLlama-1.1B-Chat-v1.0</a>
ronsor超过 1 年前
GitHub repo with links to the checkpoints: <a href="https:&#x2F;&#x2F;github.com&#x2F;jzhang38&#x2F;TinyLlama">https:&#x2F;&#x2F;github.com&#x2F;jzhang38&#x2F;TinyLlama</a>
sroussey超过 1 年前
Needs an onnx folder to use it with transformer.js out of the box.<p>Hopefully @xenova will make a copy with it soon.
theaniketmaurya超过 1 年前
Proud to see this work built using Lit-GPT coming through.
joelthelion超过 1 年前
What would you use this for?
ofou超过 1 年前
how does it compare it to phi-1?
matt1超过 1 年前
OP here with a shameless plug: for anyone interested, I&#x27;m working on a site called Emergent Mind that surfaces trending AI&#x2F;ML papers. This TinyLlama paper&#x2F;repo is trending #1 right now and likely will be for a while due to how much attention it&#x27;s getting across social media: <a href="https:&#x2F;&#x2F;www.emergentmind.com&#x2F;papers&#x2F;2401.02385" rel="nofollow">https:&#x2F;&#x2F;www.emergentmind.com&#x2F;papers&#x2F;2401.02385</a>. Emergent Mind also looks for and links to relevant discussions&#x2F;resources on Reddit, X, HackerNews, GitHub, and YouTube for every new arXiv AI&#x2F;ML paper. Feedback welcome!
评论 #38885632 未加载
评论 #38885676 未加载