If you use the Cursor IDE: the folks that wrote it talked about their use of speculative decoding to make "Apply" faster on the Lex Friedman podcast last month.<p>Here it is on YouTube, although you can also find it on Spotify and other podcast platforms:<p><a href="https://youtu.be/oFfVt3S51T4?t=1206" rel="nofollow">https://youtu.be/oFfVt3S51T4?t=1206</a>
I found the OpenAI page to be more interesting <a href="https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs" rel="nofollow">https://platform.openai.com/docs/guides/latency-optimization...</a>