O Open Source Frameworks medium

BeHonest

by Community

BeHonest: Benchmarking Honesty in Large Language Models

Visit Community View repo Submit your build →

OSS

BeHonest

Added 2 June 2026

Overview

BeHonest is a benchmarking framework that evaluates how honestly large language models express uncertainty or admit ignorance. It provides a standardized leaderboard where models are tested on their tendency to give correct answers versus making up information.

Best for

Best for
Researchers and developers who need to evaluate or improve the truthfulness of LLMs.

Use cases

Assessing a model's calibration and truthfulness before deployment
Comparing different LLMs on honesty metrics for research or selection
Identifying specific failure modes where models fabricate answers

Notes

Use cases

Assessing a model’s calibration and truthfulness before deployment
Comparing different LLMs on honesty metrics for research or selection
Identifying specific failure modes where models fabricate answers

Pros

Offers a clear, reproducible benchmark for a critical safety dimension
Public leaderboard enables direct model comparison
Focuses on an under-tested aspect of LLM behavior

Cons

Limited to the specific honesty scenarios defined by the benchmark
Does not measure other important qualities like helpfulness or safety
Leaderboard results may not generalize to all real-world use cases

Indexed from awesome-llm and enriched against its public facts.

Pros

Offers a clear, reproducible benchmark for a critical safety dimension
Public leaderboard enables direct model comparison
Focuses on an under-tested aspect of LLM behavior

Cons

Limited to the specific honesty scenarios defined by the benchmark
Does not measure other important qualities like helpfulness or safety
Leaderboard results may not generalize to all real-world use cases

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Alternative to2entries

O OSS Framework medium

OpenAI Evals

Community

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

★ 18,584 updated 2mo ago

O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 1mo ago

← Back to Open Source Submit your own entry →