We-Math
by Community
Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
OSS
We-Math
Added 2 June 2026
Overview
We-Math is a community benchmark framework for evaluating large multimodal models on mathematical reasoning tasks. It provides a leaderboard that compares model performance against human-like reasoning standards.
Best for
Best for
Researchers and developers benchmarking multimodal models on mathematical reasoning tasks
Use cases
- Evaluating multimodal models on mathematical reasoning tasks
- Benchmarking model performance against human-level reasoning
- Identifying reasoning gaps in current multimodal systems
Notes
We-Math is a community benchmark framework for evaluating large multimodal models on mathematical reasoning tasks. It provides a leaderboard that compares model performance against human-like reasoning standards.
Use cases
- Evaluating multimodal models on mathematical reasoning tasks
- Benchmarking model performance against human-level reasoning
- Identifying reasoning gaps in current multimodal systems
Pros
- Open standard for comparing multimodal math reasoning
- Direct comparison to human performance via leaderboard
- Focused benchmark for a specific capability gap
Cons
- Limited to mathematical reasoning evaluation only
- Does not assess other multimodal capabilities
- Community-driven with potentially irregular updates
Indexed from awesome-llm and enriched against its public facts.
Pros
- Open standard for comparing multimodal math reasoning
- Direct comparison to human performance via leaderboard
- Focused benchmark for a specific capability gap
Cons
- Limited to mathematical reasoning evaluation only
- Does not assess other multimodal capabilities
- Community-driven with potentially irregular updates
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.