TechEcho

6 comments

This paper was accepted as a poster to NeurIPS 2024, so it isn't just a pre-print. There is a presentation video and slides here:<a href="https://neurips.cc/virtual/2024/poster/94849" rel="nofollow">https://neurips.cc/virtual/2024/poster/94849</a>The underlying data has been open sourced as discussed on his blog here <a href="https://timothynguyen.org/2024/11/07/open-sourced-my-work-on-llms-and-n-gram-statistics/" rel="nofollow">https://timothynguyen.org/2024/11/07/open-sourced-my-work-on...</a>

pona-a1 day ago

I wonder if these N-gram reduced models, augmented with confidence measures, can act as a very fast speculative decoder. Or maybe the sheer number of explicit rules unfolded from the compressed latent representation will make it impractical.

评论 #44023533 未加载

评论 #44021099 未加载

montebicyclelo1 day ago

> The results we obtained in Section 7 imply that, at least on simple datasets like TinyStories and Wikipedia, LLM predictions contain much quantifiable structure insofar that they often can be described in terms of our simple statistical rules> we find that for 79% and 68% of LLM next-token distributions on TinyStories and Wikipedia, respectively, their top-1 predictions agree with those provided by our N-gram rulesetsTwo prediction methods may have completely different mechanisms, but agree sometimes, because they are both predicting the same thing.Seems a fairly large proportion of language can be predicted by a simpler model.. But it's the remaining percent that's the difficult part; which simple `n-gram` models are bad at, and transformers are really good at.

评论 #44021733 未加载

maz1b2 days ago

How does this have 74 points and only one comment?on topic: couldn't one in theory, re-publish this kind of paper for different kinds of LLMs, as the textual corpus upon which LLMs are built based off ultimately, at some level, human effort and human input whether it be writing, or typing?

评论 #44021990 未加载

评论 #44021074 未加载

bilsbie1 day ago

Interesting! Makes me wonder if you could replace transformers with some sort of fancy Markov chain. Maybe with a meta chain that acts as attention.

justanotherjoe2 days ago

Sounds regressive and feeds into the weird unintellectual narrative that llm is just like ngram models (lol, lmao even)Thr author submitted like 10 papers this May alone. Is that weird?

评论 #44019904 未加载

评论 #44019925 未加载

6 comments

cschmidt1 day ago

pona-a1 day ago

评论 #44023533 未加载

评论 #44021099 未加载

montebicyclelo1 day ago

评论 #44021733 未加载

maz1b2 days ago

评论 #44021990 未加载

评论 #44021074 未加载

bilsbie1 day ago

Interesting! Makes me wonder if you could replace transformers with some sort of fancy Markov chain. Maybe with a meta chain that acts as attention.

justanotherjoe2 days ago

Sounds regressive and feeds into the weird unintellectual narrative that llm is just like ngram models (lol, lmao even)Thr author submitted like 10 papers this May alone. Is that weird?

评论 #44019904 未加载

评论 #44019925 未加载

Understanding Transformers via N-gram Statistics

6 comments

Understanding Transformers via N-gram Statistics

6 comments