TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

3 pointsby jwan584over 1 year ago

2 comments

version_fiveover 1 year ago
When models are released like this, it would be great to do it with a PR to ggml/llama.cpp giving support, or use a format that's already supported. Imo if I'm choosing between a 3B and a 7B, I'm using it in an edge or local model and I don't want HF/pytorch. It would be easier to evaluate and rank higher in things to consider if I could easily get it into llama.cpp.
jwan584over 1 year ago
A helpful paper with the full recipe Cerebras uses to train LLMs and their process including: - Extensively deduplicated dataset (SlimPajama) - Hyperparameter search using muP - Variable sequence length training + ALiBi - Aggressive LR decay