TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: A context-aware semantic cache for reducing LLM app latency and cost

2 pointsby tmshapland12 months ago
We&#x27;re Tom and Adrian, the cofounders of Canonical AI. We were building a conversational AI product and wanted to use semantic caching. We tried out a few different projects, but none of them were accurate enough. The problem with the semantic caches we tried was that they didn&#x27;t have a sense of the context of the user query. That is, the same user query could mean two different things, depending on what the query is referencing.<p>So we changed course and started working on a semantic cache that understands the context of the user query. We&#x27;ve developed a number of different methods to make the cache more aware of the context. These methods include multi-tenancy (i.e., user-defined cache scopes), multi-turn cache keys, metadata tagging, etc.<p>We&#x27;d love to hear your thoughts on it!

no comments

no comments