O Open Source Observability medium

Semantic Cache Router

by Community

Distributed semantic cache and stateful routing system that cuts LLM API costs by returning cached responses for semantically similar queries. Uses ANN vector search (cosine ≥ 0.8)

Visit Community View repo Submit your build →

OSS

Added 1 June 2026

Overview

Semantic Cache Router is a distributed semantic cache and stateful routing system that reduces LLM API costs by returning cached responses for semantically similar queries. It uses ANN vector search with a cosine similarity threshold of 0.8 or higher to match incoming queries against stored embeddings.

Best for

Best for
Developers looking to lower LLM API costs for applications with repetitive or semantically similar query patterns

Use cases

Reduce API spend by caching frequent or near-duplicate user prompts
Serve semantically similar queries from cache instead of calling an LLM
Route user requests to previously computed responses based on semantic match

Notes

1 stars on GitHub. Last updated 2026-04-05.

Use cases

Reduce API spend by caching frequent or near-duplicate user prompts
Serve semantically similar queries from cache instead of calling an LLM
Route user requests to previously computed responses based on semantic match

Pros

Directly cuts LLM API costs by avoiding redundant model calls
Reduces response latency for cached queries via vector search
Distributed architecture supports horizontal scaling

Cons

Very low community adoption (1 star on GitHub) indicates early-stage project
Semantic matching accuracy depends on embedding quality and threshold tuning
Cache misses or incorrect matches may degrade user experience

Indexed from awesome-llmops and enriched against its public facts.

Pros

Directly cuts LLM API costs by avoiding redundant model calls
Reduces response latency for cached queries via vector search
Distributed architecture supports horizontal scaling

Cons

Very low community adoption (1 star on GitHub) indicates early-stage project
Semantic matching accuracy depends on embedding quality and threshold tuning
Cache misses or incorrect matches may degrade user experience

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Uses4entries

O OSS Obs medium

Milvus

Community

Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search

★ 44,579 updated 1mo ago

O OSS Obs medium

Qdrant

Community

Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

★ 31,735 updated 1mo ago

O OSS Obs medium

Chroma

Community

Search infrastructure for AI

★ 28,173 updated 1mo ago

O OSS Obs medium

pgvector

Community

Open-source vector similarity search for Postgres

★ 21,551 updated 1mo ago

Pairs with1entry

O OSS Obs medium

LiteLLM 🚅

Community

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, Vertex

★ 48,950 updated 1mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →