BeHonest
by Community
BeHonest: Benchmarking Honesty in Large Language Models
OSS
BeHonest
Added 2 June 2026
Overview
BeHonest is a benchmarking framework that evaluates how honestly large language models express uncertainty or admit ignorance. It provides a standardized leaderboard where models are tested on their tendency to give correct answers versus making up information.
Best for
Best for
Researchers and developers who need to evaluate or improve the truthfulness of LLMs.
Use cases
- Assessing a model's calibration and truthfulness before deployment
- Comparing different LLMs on honesty metrics for research or selection
- Identifying specific failure modes where models fabricate answers
Notes
BeHonest is a benchmarking framework that evaluates how honestly large language models express uncertainty or admit ignorance. It provides a standardized leaderboard where models are tested on their tendency to give correct answers versus making up information.
Use cases
- Assessing a model’s calibration and truthfulness before deployment
- Comparing different LLMs on honesty metrics for research or selection
- Identifying specific failure modes where models fabricate answers
Pros
- Offers a clear, reproducible benchmark for a critical safety dimension
- Public leaderboard enables direct model comparison
- Focuses on an under-tested aspect of LLM behavior
Cons
- Limited to the specific honesty scenarios defined by the benchmark
- Does not measure other important qualities like helpfulness or safety
- Leaderboard results may not generalize to all real-world use cases
Indexed from awesome-llm and enriched against its public facts.
Pros
- Offers a clear, reproducible benchmark for a critical safety dimension
- Public leaderboard enables direct model comparison
- Focuses on an under-tested aspect of LLM behavior
Cons
- Limited to the specific honesty scenarios defined by the benchmark
- Does not measure other important qualities like helpfulness or safety
- Leaderboard results may not generalize to all real-world use cases
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
OpenAI Evals
Community
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
lm-evaluation-harness
Community
A framework for few-shot evaluation of language models.