One of the most fun parts of working with LLMs is finding solutions to technical problems in the ways people have solved the same problems in interpersonal communication.<p>Semantic caching reduces LLM costs and latency, but simple vector similarity search doesn't work well for conversational AI. To make semantic caching effective for context-dependent we’ve modeled features of human communication into our cache.<p>Have human communcation features helped you solve any LLM application problems? Share them in the comments!