Enterprise DNA
O Open Source Observability medium

ai-evaluation

by Community

Evaluation Framework for all your AI related Workflows

A

OSS

ai-evaluation

Added 1 June 2026

#agentic-ai #ai #ai-agents #cicd #evaluation #ml

Overview

A community-built evaluation framework for AI workflows, written in Python. It provides tools to assess and validate outputs from AI models and pipelines.

Best for

Best for
Developers seeking a simple, open-source evaluation framework for AI workflows

Use cases

  • Testing and scoring LLM responses against expected criteria
  • Monitoring performance of AI systems in production
  • Validating outputs from custom AI workflows

Notes

A community-built evaluation framework for AI workflows, written in Python. It provides tools to assess and validate outputs from AI models and pipelines.

105 stars on GitHub. Last updated 2026-05-29. Licensed Apache-2.0.

Use cases

  • Testing and scoring LLM responses against expected criteria
  • Monitoring performance of AI systems in production
  • Validating outputs from custom AI workflows

Pros

  • Open source and free to use
  • Lightweight Python implementation
  • Focused specifically on AI evaluation

Cons

  • Small community with only 105 stars
  • Limited documentation and examples
  • May lack advanced features found in larger frameworks

Indexed from awesome-llmops and enriched against its public facts.

Pros

  • Open source and free to use
  • Lightweight Python implementation
  • Focused specifically on AI evaluation

Cons

  • Small community with only 105 stars
  • Limited documentation and examples
  • May lack advanced features found in larger frameworks