O Open Source Frameworks medium

We-Math

by Community

Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Visit Community View repo Submit your build →

OSS

We-Math

Added 2 June 2026

Overview

We-Math is a community benchmark framework for evaluating large multimodal models on mathematical reasoning tasks. It provides a leaderboard that compares model performance against human-like reasoning standards.

Best for

Best for
Researchers and developers benchmarking multimodal models on mathematical reasoning tasks

Use cases

Evaluating multimodal models on mathematical reasoning tasks
Benchmarking model performance against human-level reasoning
Identifying reasoning gaps in current multimodal systems

Notes

Use cases

Evaluating multimodal models on mathematical reasoning tasks
Benchmarking model performance against human-level reasoning
Identifying reasoning gaps in current multimodal systems

Pros

Open standard for comparing multimodal math reasoning
Direct comparison to human performance via leaderboard
Focused benchmark for a specific capability gap

Cons

Limited to mathematical reasoning evaluation only
Does not assess other multimodal capabilities
Community-driven with potentially irregular updates

Indexed from awesome-llm and enriched against its public facts.

Pros

Open standard for comparing multimodal math reasoning
Direct comparison to human performance via leaderboard
Focused benchmark for a specific capability gap

Cons

Limited to mathematical reasoning evaluation only
Does not assess other multimodal capabilities
Community-driven with potentially irregular updates

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with1entry

O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 1mo ago

← Back to Open Source Submit your own entry →