58 pointsby limoce7 months ago

3 comments

cbhl7 months ago

If you use the Cursor IDE: the folks that wrote it talked about their use of speculative decoding to make "Apply" faster on the Lex Friedman podcast last month.<p>Here it is on YouTube, although you can also find it on Spotify and other podcast platforms:<p><a href="https://youtu.be/oFfVt3S51T4?t=1206" rel="nofollow">https://youtu.be/oFfVt3S51T4?t=1206</a>

评论 #42048776 未加载

评论 #42049090 未加载

creativenolo7 months ago

I found the OpenAI page to be more interesting <a href="https://platform.openai.com/docs/guides/latency-optimization#use-predicted-outputs" rel="nofollow">https://platform.openai.com/docs/guides/latency-optimization...</a>

评论 #42050922 未加载

nunez7 months ago

This is like the likely() and unlikely() macros in the Linux kernel! Huge speedup if you're right; small penalty if you're not.

评论 #42054684 未加载

New OpenAI Feature: Predicted Outputs

3 comments

New OpenAI Feature: Predicted Outputs

3 comments