Home 24h Top Newest Best Ask Show Jobs

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

Home

Home Newest Best Ask Show Jobs

Resources

HackerNews API Original HackerNews Next.js

© 2025 TechEcho. All rights reserved.

Running LLMs with 3.3M Context Tokens on a Single GPU

14 pointsby Van_Chopiszt7 months ago

1 comment

charlie_xxx7 months ago

Their demo looks really cool: <a href="https://github.com/mit-han-lab/duo-attention">https://github.com/mit-han-lab/duo-attention</a>