TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

TinyLlama: An Open-Source Small Language Model

143 pointsby matt1over 1 year ago

10 comments

minimaxirover 1 year ago
It was fun to follow the public TinyLlama loss curves in near real-time, although it showed that it can be frustrating since the loss curves barely moved down even after an extra trillion tokens: <a href="https:&#x2F;&#x2F;wandb.ai&#x2F;lance777&#x2F;lightning_logs&#x2F;reports&#x2F;metric-train_loss-23-09-04-23-38-15---Vmlldzo1MzA4MzIw?accessToken=5eu2sndit2mo6eqls8h38sklcgfwt660ek1f2czlgtqjv2c6tida47qm1oty8ik9" rel="nofollow">https:&#x2F;&#x2F;wandb.ai&#x2F;lance777&#x2F;lightning_logs&#x2F;reports&#x2F;metric-trai...</a> (note the log-scaled X-axis)<p>But they <i>did</i> move down and that&#x27;s what&#x27;s important.<p>There should probably be more aggressive learning rate annealing for models trying to be Chinchilla-optimal instead of just cosine-with-warmup like every other model nowadays.
评论 #38886685 未加载
评论 #38886504 未加载
TheCorehover 1 year ago
From the GitHub repo Readme:<p>&gt; we can achieve this within a span of &quot;just&quot; 90 days using 16 A100-40G GPUs<p>I knew the computational power required to train LLMs was absurd, but seeing the figures of larger networks (which are just too large to intuitively understand) it didn&#x27;t really register. With this one I could actually imagine the 16 machines with A100 GPUs sitting on a server room running at full blast for 90 days so it was more tangible... And now to think about the larger ones is kinda scary<p>Edit: Did the math and just the GPUs (at 250W each) consumed around 8.64 MWh, which is at the same ballpark of the power consumption of the average US home in one year (10.5MWh)
评论 #38890476 未加载
andy99over 1 year ago
I&#x27;ve been using one of the earlier checkpoints for benchmarking a Llama implementation. Completely anecdotally I feel at least as good or better about this one than the earlier openllama 3B. I wouldn&#x27;t use either of them for RAG or anything requiring more power, just to say that it&#x27;s competitive as a smaller model, whatever you use those for, and easy to run on CPU at FP16 (meaning without serious quantization).
评论 #38886352 未加载
评论 #38885929 未加载
评论 #38885498 未加载
评论 #38885685 未加载
dmezzettiover 1 year ago
Link to model on HF Hub: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;TinyLlama&#x2F;TinyLlama-1.1B-Chat-v1.0" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;TinyLlama&#x2F;TinyLlama-1.1B-Chat-v1.0</a>
ronsorover 1 year ago
GitHub repo with links to the checkpoints: <a href="https:&#x2F;&#x2F;github.com&#x2F;jzhang38&#x2F;TinyLlama">https:&#x2F;&#x2F;github.com&#x2F;jzhang38&#x2F;TinyLlama</a>
srousseyover 1 year ago
Needs an onnx folder to use it with transformer.js out of the box.<p>Hopefully @xenova will make a copy with it soon.
theaniketmauryaover 1 year ago
Proud to see this work built using Lit-GPT coming through.
joelthelionover 1 year ago
What would you use this for?
ofouover 1 year ago
how does it compare it to phi-1?
matt1over 1 year ago
OP here with a shameless plug: for anyone interested, I&#x27;m working on a site called Emergent Mind that surfaces trending AI&#x2F;ML papers. This TinyLlama paper&#x2F;repo is trending #1 right now and likely will be for a while due to how much attention it&#x27;s getting across social media: <a href="https:&#x2F;&#x2F;www.emergentmind.com&#x2F;papers&#x2F;2401.02385" rel="nofollow">https:&#x2F;&#x2F;www.emergentmind.com&#x2F;papers&#x2F;2401.02385</a>. Emergent Mind also looks for and links to relevant discussions&#x2F;resources on Reddit, X, HackerNews, GitHub, and YouTube for every new arXiv AI&#x2F;ML paper. Feedback welcome!
评论 #38885632 未加载
评论 #38885676 未加载