O Open Source Frameworks medium

OLMO-eval

by Community

Evaluation suite for LLMs

Visit Community View repo Submit your build →

OSS

OLMO-eval

Added 1 June 2026

Overview

OLMO-eval is a Python-based evaluation suite for large language models (LLMs). It provides standardized benchmarks and metrics to assess model performance across multiple tasks.

Best for

Best for
Researchers and developers evaluating OLMo or compatible LLMs with reproducible benchmarks

Use cases

Running reproducible evaluations on LLMs using established benchmarks
Comparing performance of different model versions or configurations
Integrating evaluation pipelines into model training workflows

Notes

OLMO-eval is a Python-based evaluation suite for large language models (LLMs). It provides standardized benchmarks and metrics to assess model performance across multiple tasks.

379 stars on GitHub. Last updated 2025-07-11. Licensed Apache-2.0.

Use cases

Running reproducible evaluations on LLMs using established benchmarks
Comparing performance of different model versions or configurations
Integrating evaluation pipelines into model training workflows

Pros

Open-source and community-maintained under the Allen AI umbrella
Simplifies running standard LLM evaluations with a single Python framework

Cons

Small star count (379) indicates limited community adoption and support
Primarily designed for OLMo models, may require adaptation for other architectures

Indexed from awesome-llm and enriched against its public facts.

Pros

Open-source and community-maintained under the Allen AI umbrella
Simplifies running standard LLM evaluations with a single Python framework

Cons

Small star count (379) indicates limited community adoption and support
Primarily designed for OLMo models, may require adaptation for other architectures

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Built with1entry

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 1mo ago

Pairs with2entries

O OSS Framework medium

OpenAI Evals

Community

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

★ 18,584 updated 3mo ago

O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 2mo ago

Alternative to2entries

O OSS Framework medium

OpenAI Evals

Community

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

★ 18,584 updated 3mo ago

O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 2mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →