O Open Source Frameworks medium

SciBench

by Community

Evaluating scientific problems

Visit Community View repo Submit your build →

OSS

SciBench

Added 2 June 2026

Overview

SciBench is a community-maintained benchmark for evaluating AI systems on scientific problem solving. It provides a standardized set of tasks across scientific domains and maintains a public leaderboard for comparing model performance.

Best for

Best for
Researchers and developers evaluating AI systems on scientific reasoning tasks

Use cases

Benchmark scientific reasoning capabilities of language models
Compare model performance on standardized scientific tasks
Track progress in scientific problem solving across AI systems

Notes

Use cases

Benchmark scientific reasoning capabilities of language models
Compare model performance on standardized scientific tasks
Track progress in scientific problem solving across AI systems

Pros

Open-source and community driven, encouraging broad participation
Focuses on rigorous scientific reasoning rather than general language tasks
Public leaderboard enables transparent comparison

Cons

Limited to the scientific domains covered by the benchmark tasks
May not reflect real-world scientific problem complexity
Leaderboard updates depend on community contributions

Indexed from awesome-llm and enriched against its public facts.

Pros

Open-source and community driven, encouraging broad participation
Focuses on rigorous scientific reasoning rather than general language tasks
Public leaderboard enables transparent comparison

Cons

Limited to the scientific domains covered by the benchmark tasks
May not reflect real-world scientific problem complexity
Leaderboard updates depend on community contributions

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Alternative to2entries

O OSS Framework medium

OpenAI Evals

Community

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

★ 18,584 updated 2mo ago

O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 1mo ago

← Back to Open Source Submit your own entry →