Enterprise DNA
O Open Source Frameworks medium

FELM

by Community

FELM: Benchmarking Factuality Evaluation of Large Language Models

F

OSS

FELM

Added 1 June 2026

Overview

FELM is a benchmark for evaluating how factually accurate large language models are. It provides a standardized dataset and methodology to measure factuality across different models and tasks.

Best for

Best for
Researchers and developers needing a standardized way to measure LLM factuality

Use cases

  • Assessing factual accuracy of LLM outputs in research
  • Comparing factuality performance across multiple models
  • Validating model improvements in truthfulness

Notes

FELM is a benchmark for evaluating how factually accurate large language models are. It provides a standardized dataset and methodology to measure factuality across different models and tasks.

Use cases

  • Assessing factual accuracy of LLM outputs in research
  • Comparing factuality performance across multiple models
  • Validating model improvements in truthfulness

Pros

  • Provides a structured, reproducible evaluation framework
  • Focuses specifically on factuality, a critical quality metric
  • Community-driven benchmark with transparent methodology

Cons

  • Limited to the specific tasks and datasets in the benchmark
  • May not cover all real-world factuality challenges
  • Requires familiarity with benchmarking tools and setup

Indexed from awesome-llm and enriched against its public facts.

Pros

  • Provides a structured, reproducible evaluation framework
  • Focuses specifically on factuality, a critical quality metric
  • Community-driven benchmark with transparent methodology

Cons

  • Limited to the specific tasks and datasets in the benchmark
  • May not cover all real-world factuality challenges
  • Requires familiarity with benchmarking tools and setup