TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Learning to Compress Prompts with Gist Tokens

2 pointsby jasondaviesabout 2 years ago

1 comment

abrichrabout 2 years ago
&gt; Finetuning and distillation methods allow for specialization of LMs without prompting, but require retraining the model for each task. To avoid this trade-off entirely, we present gisting, which trains an LM to compress prompts into smaller sets of &quot;gist&quot; tokens which can be reused for compute efficiency. Gist models can be easily trained as part of instruction finetuning via a restricted attention mask that encourages prompt compression. On decoder (LLaMA-7B) and encoder-decoder (FLAN-T5-XXL) LMs, gisting enables up to 26x compression of prompts, resulting in up to 40% FLOPs reductions, 4.2% wall time speedups, storage savings, and minimal loss in output quality.<p>Sounds promising! No code?<p>Edit: here it is! <a href="https:&#x2F;&#x2F;github.com&#x2F;jayelm&#x2F;gisting">https:&#x2F;&#x2F;github.com&#x2F;jayelm&#x2F;gisting</a><p>From this Twitter thread: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;jayelmnop&#x2F;status&#x2F;1648365743195684873" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;jayelmnop&#x2F;status&#x2F;1648365743195684873</a>