TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How to Train a Million Context LLM

15 pointsby 7d7n12 months ago

1 comment

swyx12 months ago
oh hey we&#x27;re on HN! author&#x2F;host here, we think the story of long context over the past year is worth reviewing so we invited Mark on to talk about extending Llama 3 to &gt;1m tokens.<p>a year ago we were talking to MosaicML (<a href="https:&#x2F;&#x2F;x.com&#x2F;swyx&#x2F;status&#x2F;1660033177178734592" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;swyx&#x2F;status&#x2F;1660033177178734592</a>) about their 65k+ model. now people yawn when we have yet another 1m token model. wild.<p>the TLDR in the pod seems to be Meta choosing to train Llama with a RoPE scaling theta factor that can be tweaked for finetuning. Once Gradient noticed that it was off to the races.