TechEcho

1 comment

abrichrabout 2 years ago

> Finetuning and distillation methods allow for specialization of LMs without prompting, but require retraining the model for each task. To avoid this trade-off entirely, we present gisting, which trains an LM to compress prompts into smaller sets of "gist" tokens which can be reused for compute efficiency. Gist models can be easily trained as part of instruction finetuning via a restricted attention mask that encourages prompt compression. On decoder (LLaMA-7B) and encoder-decoder (FLAN-T5-XXL) LMs, gisting enables up to 26x compression of prompts, resulting in up to 40% FLOPs reductions, 4.2% wall time speedups, storage savings, and minimal loss in output quality.<p>Sounds promising! No code?<p>Edit: here it is! <a href="https://github.com/jayelm/gisting">https://github.com/jayelm/gisting</a><p>From this Twitter thread: <a href="https://twitter.com/jayelmnop/status/1648365743195684873" rel="nofollow">https://twitter.com/jayelmnop/status/1648365743195684873</a>

Learning to Compress Prompts with Gist Tokens

1 comment

Learning to Compress Prompts with Gist Tokens

1 comment