Enterprise DNA
O Open Source Frameworks medium

Ragas

by Community

Supercharge Your LLM Application Evaluations πŸš€

R

OSS

Ragas

Added 1 June 2026

#evaluation #llm #llmops

Overview

Ragas is a Python framework for evaluating LLM applications through automated metrics and test generation. It measures retrieval quality, generation accuracy, and end-to-end performance without requiring manual ground truth labels. Designed for RAG systems and LLM pipelines, it provides quantitative feedback on application behavior.

Best for

Best for
Teams building RAG systems who need continuous evaluation without manual labeling

Use cases

  • Measuring retrieval quality in RAG systems
  • Benchmarking LLM output accuracy and relevance
  • Automated test generation for prompt chains

Notes

Ragas is a Python framework for evaluating LLM applications through automated metrics and test generation. It measures retrieval quality, generation accuracy, and end-to-end performance without requiring manual ground truth labels. Designed for RAG systems and LLM pipelines, it provides quantitative feedback on application behavior.

14,186 stars on GitHub. Last updated 2026-02-24. Licensed Apache-2.0.

Use cases

  • Measuring retrieval quality in RAG systems
  • Benchmarking LLM output accuracy and relevance
  • Automated test generation for prompt chains

Pros

  • Reduces evaluation overhead by automating metric computation
  • Works without pre-built ground truth datasets
  • Active open source community with 14k+ stars

Cons

  • Metrics depend on LLM quality, introducing circular dependencies
  • Python-only, requires integration into existing workflows
  • Automated metrics may not capture domain-specific correctness

Indexed from awesome-llm and enriched against its public facts.

Pros

  • Reduces evaluation overhead by automating metric computation
  • Works without pre-built ground truth datasets
  • Active open source community with 14k+ stars

Cons

  • Metrics depend on LLM quality, introducing circular dependencies
  • Python-only, requires integration into existing workflows
  • Automated metrics may not capture domain-specific correctness

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with12entries
O OSS Framework medium

AutoRAG

Community

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

β˜… 4,802 updated 15d ago
O OSS Framework medium

awesome-hallucination-detection

Community

List of papers on hallucination detection in LLMs.

β˜… 1,096 updated 9d ago
O OSS Framework medium

Awesome-LLM-hallucination

Community

LLM hallucination paper list

β˜… 335 updated 2y ago
O OSS Framework medium

CompMix

Community

CompMix: A Benchmark for Heterogeneous Question Answering.

O OSS Framework medium

Evidently

Community

Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

β˜… 7,561 updated 1mo ago
O OSS Framework medium

Giskard

Community

🐒 Open-Source Evaluation & Testing library for LLM Agents

β˜… 5,414 updated 5d ago
O OSS Framework medium

InfiBench

Community

IInfiBench: Evaluating the Question-Answering Capabilities of Code LLMs

O OSS Framework medium

LawBench

Community

LawBench

O OSS Framework medium

LLMEval

Community

LLMEval is a research series dedicated to building comprehensive, fair, and robust evaluation frameworks for large language models.

O OSS Framework medium

MMToM-QA

Community

Leaderboard for the MMToM-QA benchmark (Jin et al., ACL 2024).

O OSS Framework medium

PubMedQA

Community

PubMedQA Homepage

O OSS Framework medium

TAT-DQA

Community

TAT-DQA: A Document Visual Question Answering (VQA) Dataset, aiming to answer questions over visually-rich documents with a hybrid of Tabular and Textual Content in Finance