MMToM-QA
by Community
Leaderboard for the MMToM-QA benchmark (Jin et al., ACL 2024).
OSS
MMToM-QA
Added 1 June 2026
Overview
A community-maintained leaderboard for the MMToM-QA benchmark introduced by Jin et al. at ACL 2024. It tracks and compares model performance on a multimodal question answering task.
Best for
Best for
Researchers and practitioners evaluating multimodal QA models on theory-of-mind reasoning tasks
Use cases
- Benchmarking multimodal QA models against a standardized evaluation
- Comparing model performance on theory-of-mind related questions
- Tracking progress and reproducibility across research efforts
Notes
A community-maintained leaderboard for the MMToM-QA benchmark introduced by Jin et al. at ACL 2024. It tracks and compares model performance on a multimodal question answering task.
Use cases
- Benchmarking multimodal QA models against a standardized evaluation
- Comparing model performance on theory-of-mind related questions
- Tracking progress and reproducibility across research efforts
Pros
- Provides a centralized and transparent evaluation for the benchmark
- Open and community-driven, allowing easy contributions
- Directly tied to an ACL 2024 publication for referenced methodology
Cons
- Limited to the specific MMToM-QA benchmark and its task definition
- May lack active maintenance or frequent updates from the community
- Not a general-purpose framework beyond the leaderboard function
Indexed from awesome-llm and enriched against its public facts.
Pros
- Provides a centralized and transparent evaluation for the benchmark
- Open and community-driven, allowing easy contributions
- Directly tied to an ACL 2024 publication for referenced methodology
Cons
- Limited to the specific MMToM-QA benchmark and its task definition
- May lack active maintenance or frequent updates from the community
- Not a general-purpose framework beyond the leaderboard function
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
lm-evaluation-harness
Community
A framework for few-shot evaluation of language models.
OpenAI Evals
Community
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Ragas
Community
Supercharge Your LLM Application Evaluations 🚀