科技回声

10 条评论

minimaxir超过 1 年前

It was fun to follow the public TinyLlama loss curves in near real-time, although it showed that it can be frustrating since the loss curves barely moved down even after an extra trillion tokens: <a href="https://wandb.ai/lance777/lightning_logs/reports/metric-train_loss-23-09-04-23-38-15---Vmlldzo1MzA4MzIw?accessToken=5eu2sndit2mo6eqls8h38sklcgfwt660ek1f2czlgtqjv2c6tida47qm1oty8ik9" rel="nofollow">https://wandb.ai/lance777/lightning_logs/reports/metric-trai...</a> (note the log-scaled X-axis)But they did move down and that's what's important.There should probably be more aggressive learning rate annealing for models trying to be Chinchilla-optimal instead of just cosine-with-warmup like every other model nowadays.

评论 #38886685 未加载

评论 #38886504 未加载

TheCoreh超过 1 年前

From the GitHub repo Readme:> we can achieve this within a span of "just" 90 days using 16 A100-40G GPUsI knew the computational power required to train LLMs was absurd, but seeing the figures of larger networks (which are just too large to intuitively understand) it didn't really register. With this one I could actually imagine the 16 machines with A100 GPUs sitting on a server room running at full blast for 90 days so it was more tangible... And now to think about the larger ones is kinda scaryEdit: Did the math and just the GPUs (at 250W each) consumed around 8.64 MWh, which is at the same ballpark of the power consumption of the average US home in one year (10.5MWh)

评论 #38890476 未加载

andy99超过 1 年前

I've been using one of the earlier checkpoints for benchmarking a Llama implementation. Completely anecdotally I feel at least as good or better about this one than the earlier openllama 3B. I wouldn't use either of them for RAG or anything requiring more power, just to say that it's competitive as a smaller model, whatever you use those for, and easy to run on CPU at FP16 (meaning without serious quantization).

评论 #38886352 未加载

评论 #38885929 未加载

评论 #38885498 未加载

评论 #38885685 未加载

dmezzetti超过 1 年前

Link to model on HF Hub: <a href="https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0" rel="nofollow">https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0</a>

ronsor超过 1 年前

GitHub repo with links to the checkpoints: <a href="https://github.com/jzhang38/TinyLlama">https://github.com/jzhang38/TinyLlama</a>

sroussey超过 1 年前

Needs an onnx folder to use it with transformer.js out of the box.Hopefully @xenova will make a copy with it soon.

theaniketmaurya超过 1 年前

Proud to see this work built using Lit-GPT coming through.

joelthelion超过 1 年前

What would you use this for?

ofou超过 1 年前

how does it compare it to phi-1?

matt1超过 1 年前

OP here with a shameless plug: for anyone interested, I'm working on a site called Emergent Mind that surfaces trending AI/ML papers. This TinyLlama paper/repo is trending #1 right now and likely will be for a while due to how much attention it's getting across social media: <a href="https://www.emergentmind.com/papers/2401.02385" rel="nofollow">https://www.emergentmind.com/papers/2401.02385</a>. Emergent Mind also looks for and links to relevant discussions/resources on Reddit, X, HackerNews, GitHub, and YouTube for every new arXiv AI/ML paper. Feedback welcome!

评论 #38885632 未加载

评论 #38885676 未加载

10 条评论

minimaxir超过 1 年前

评论 #38886685 未加载

评论 #38886504 未加载

TheCoreh超过 1 年前

评论 #38890476 未加载

andy99超过 1 年前

评论 #38886352 未加载

评论 #38885929 未加载

评论 #38885498 未加载

评论 #38885685 未加载

dmezzetti超过 1 年前

Link to model on HF Hub: <a href="https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0" rel="nofollow">https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0</a>

ronsor超过 1 年前

GitHub repo with links to the checkpoints: <a href="https://github.com/jzhang38/TinyLlama">https://github.com/jzhang38/TinyLlama</a>

sroussey超过 1 年前

Needs an onnx folder to use it with transformer.js out of the box.Hopefully @xenova will make a copy with it soon.

theaniketmaurya超过 1 年前

Proud to see this work built using Lit-GPT coming through.

joelthelion超过 1 年前

What would you use this for?

ofou超过 1 年前

how does it compare it to phi-1?

matt1超过 1 年前

评论 #38885632 未加载

评论 #38885676 未加载

TinyLlama: An Open-Source Small Language Model

10 条评论

TinyLlama: An Open-Source Small Language Model

10 条评论