Text-Embeddings-Inference
by Community
A blazing fast inference solution for text embeddings models
OSS
Text-Embeddings-Inference
Added 1 June 2026
Overview
Text-Embeddings-Inference is a framework for serving text embeddings models at high throughput. Built in Rust, it provides a REST API to generate embeddings from various transformer models. It is designed for low-latency inference, making it suitable for production embedding pipelines.
Best for
Best for
Developers who need fast, scalable embedding serving for search or NLP pipelines
Use cases
- Generate embeddings for semantic search
- Compute embeddings for text classification
- Serve embeddings for clustering workflows
Notes
Text-Embeddings-Inference is a framework for serving text embeddings models at high throughput. Built in Rust, it provides a REST API to generate embeddings from various transformer models. It is designed for low-latency inference, making it suitable for production embedding pipelines.
4,829 stars on GitHub. Last updated 2026-05-26. Licensed Apache-2.0.
Use cases
- Generate embeddings for semantic search
- Compute embeddings for text classification
- Serve embeddings for clustering workflows
Pros
- High throughput due to Rust implementation
- Supports a wide range of embedding models from Hugging Face
- Low latency inference
Cons
- Only supports text embeddings models, not generative or other tasks
- Requires appropriate hardware (GPU) for optimal performance
- Limited community support as a community project
Indexed from awesome-llm and enriched against its public facts.
Pros
- High throughput due to Rust implementation
- Supports a wide range of embedding models from Hugging Face
- Low latency inference
Cons
- Only supports text embeddings models, not generative or other tasks
- Requires appropriate hardware (GPU) for optimal performance
- Limited community support as a community project
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.