We’ve talked to many developers who try semantic caching with a simple cosine similarity search, see the unsurprisingly poor accuracy from this context-agnostic approach, and kick the can on semantic caching’s cost and latency improvements.<p>An accurate and effective LLM cache needs to understand the context of the conversation with the user. A context-aware semantic cache requires multi-turn cache keys, named entity recognition, query elaboration, metatags, templatization, function call caching, custom cache scoping, dynamic cache invalidation, and so on – all at lightning fast speeds.