TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

SepLLM: Accelerate LLMs by Compressing One Segment into One Separator

39 pointsby limoce3 months ago

2 comments

kevmo3143 months ago
This paper seems like it misses the forest for the trees. The analysis is certainly interesting and the proposal sounds viable, sort of like a sliding window attention with a little more history.<p>But if it is true that the separators contribute the most towards the attention scores, wouldn&#x27;t that imply that the tokenization scheme can be improved? Introducing a compression scheme seems like patching around that compared to if the model naturally generated a more random attention distribution.
xp843 months ago
Or, put another way:<p>&#x27;Why waste time say lot token when few token do trick?&quot;<p>-Kevin Malone