Semantic Cache Router
by Community
Distributed semantic cache and stateful routing system that cuts LLM API costs by returning cached responses for semantically similar queries. Uses ANN vector search (cosine ≥ 0.8)
OSS
Semantic Cache Router
Added 1 June 2026
Overview
Semantic Cache Router is a distributed semantic cache and stateful routing system that reduces LLM API costs by returning cached responses for semantically similar queries. It uses ANN vector search with a cosine similarity threshold of 0.8 or higher to match incoming queries against stored embeddings.
Best for
Best for
Developers looking to lower LLM API costs for applications with repetitive or semantically similar query patterns
Use cases
- Reduce API spend by caching frequent or near-duplicate user prompts
- Serve semantically similar queries from cache instead of calling an LLM
- Route user requests to previously computed responses based on semantic match
Notes
Semantic Cache Router is a distributed semantic cache and stateful routing system that reduces LLM API costs by returning cached responses for semantically similar queries. It uses ANN vector search with a cosine similarity threshold of 0.8 or higher to match incoming queries against stored embeddings.
1 stars on GitHub. Last updated 2026-04-05.
Use cases
- Reduce API spend by caching frequent or near-duplicate user prompts
- Serve semantically similar queries from cache instead of calling an LLM
- Route user requests to previously computed responses based on semantic match
Pros
- Directly cuts LLM API costs by avoiding redundant model calls
- Reduces response latency for cached queries via vector search
- Distributed architecture supports horizontal scaling
Cons
- Very low community adoption (1 star on GitHub) indicates early-stage project
- Semantic matching accuracy depends on embedding quality and threshold tuning
- Cache misses or incorrect matches may degrade user experience
Indexed from awesome-llmops and enriched against its public facts.
Pros
- Directly cuts LLM API costs by avoiding redundant model calls
- Reduces response latency for cached queries via vector search
- Distributed architecture supports horizontal scaling
Cons
- Very low community adoption (1 star on GitHub) indicates early-stage project
- Semantic matching accuracy depends on embedding quality and threshold tuning
- Cache misses or incorrect matches may degrade user experience
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
Qdrant
Community
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
Milvus
Community
Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
Chroma
Community
Search infrastructure for AI
Weaviate
Community
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance an
LangChain
Community
The agent engineering platform.
LiteLLM 🚅
Community
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, Vertex
Embedchain
Community
Universal memory layer for AI Agents