TE
TechEcho
Home
24h Top
Newest
Best
Ask
Show
Jobs
English
GitHub
Twitter
Home
Physics of Language Models: Architecture Design and the Magic of Canon Layers
19 points
by
nkko
11 days ago
1 comment
darknoon
about 12 hours ago
anyone know why they mix in the 3 previous tokens? could have just as easily done 5 or 2 right?