O Open Source Frameworks medium

MathEval

by Community

a comprehensive benchmarking platform designed to evaluate large models' mathematical abilities across 20 fields and nearly 30,000 math problems.

Visit Community View repo Submit your build →

OSS

MathEval

Added 1 June 2026

Overview

MathEval is a benchmarking platform for evaluating large models on mathematical problems. It covers 20 fields and nearly 30,000 problems, providing a standardized test suite for assessing mathematical reasoning.

Best for

Best for
Researchers and developers benchmarking mathematical reasoning in large models.

Use cases

Benchmarking LLMs on mathematical reasoning across diverse fields
Comparing model performance on a standardized set of nearly 30,000 problems
Evaluating fine-tuned models for math-specific capabilities

Notes

Use cases

Benchmarking LLMs on mathematical reasoning across diverse fields
Comparing model performance on a standardized set of nearly 30,000 problems
Evaluating fine-tuned models for math-specific capabilities

Pros

Large benchmark with nearly 30,000 problems for robust evaluation
Covers 20 distinct mathematical fields for broad assessment
Community-driven platform encouraging transparency and collaboration

Cons

Limited to mathematical abilities, not a general benchmark
Problem selection may not represent all math subfields equally
No built-in model or training component, evaluation only

Indexed from awesome-llm and enriched against its public facts.

Pros

Large benchmark with nearly 30,000 problems for robust evaluation
Covers 20 distinct mathematical fields for broad assessment
Community-driven platform encouraging transparency and collaboration

Cons

Limited to mathematical abilities, not a general benchmark
Problem selection may not represent all math subfields equally
No built-in model or training component, evaluation only

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Built with1entry

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 1mo ago

Pairs with1entry

O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 2mo ago

Alternative to1entry

O OSS Framework medium

OpenAI Evals

Community

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

★ 18,584 updated 3mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →