Enterprise DNA
O Open Source Orchestration medium

Llama2 Embedding Server

by Community

A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.

LE

OSS

Llama2 Embedding Server

Added 1 June 2026

#embedding-similarity #embedding-vectors #embeddings #llama2 #llamacpp #semantic-search

Overview

Llama2 Embedding Server is a FastAPI service for semantic text search. It uses precomputed embeddings and advanced similarity measures to find similar texts. It supports multiple file types through textract for extraction.

Best for

Best for
Developers needing a lightweight semantic search server for static text collections.

Use cases

  • Build a semantic search API over a document corpus
  • Perform similarity searches on precomputed text embeddings
  • Integrate file extraction and embedding into a single service

Notes

Llama2 Embedding Server is a FastAPI service for semantic text search. It uses precomputed embeddings and advanced similarity measures to find similar texts. It supports multiple file types through textract for extraction.

1,053 stars on GitHub. Last updated 2025-02-27.

Use cases

  • Build a semantic search API over a document corpus
  • Perform similarity searches on precomputed text embeddings
  • Integrate file extraction and embedding into a single service

Pros

  • Provides a ready-to-deploy FastAPI server for embeddings
  • Supports multiple file formats via textract
  • Uses advanced similarity measures beyond cosine

Cons

  • Only supports precomputed embeddings, not real-time generation
  • Community project may have limited support or updates
  • Requires manual embedding computation upfront

Indexed from awesome-langchain and enriched against its public facts.

Pros

  • Provides a ready-to-deploy FastAPI server for embeddings
  • Supports multiple file formats via textract
  • Uses advanced similarity measures beyond cosine

Cons

  • Only supports precomputed embeddings, not real-time generation
  • Community project may have limited support or updates
  • Requires manual embedding computation upfront