TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Running LLMs with 3.3M Context Tokens on a Single GPU

14 pointsby Van_Chopiszt7 months ago

1 comment

charlie_xxx7 months ago
Their demo looks really cool: <a href="https:&#x2F;&#x2F;github.com&#x2F;mit-han-lab&#x2F;duo-attention">https:&#x2F;&#x2F;github.com&#x2F;mit-han-lab&#x2F;duo-attention</a>