O Open Source Frameworks medium

Open LLM Leaderboard

by Community

Track, rank and evaluate open LLMs and chatbots

Visit Community View repo Submit your build →

OSS

Open LLM Leaderboard

Added 1 June 2026

Overview

The Open LLM Leaderboard is a community-maintained benchmark that tracks, ranks, and evaluates open-source large language models and chatbots. It uses standardized tests like ARC, HellaSwag, MMLU, and TruthfulQA to produce comparable performance scores.

Best for

Best for
Developers and researchers evaluating open LLMs for general-purpose language tasks

Use cases

Compare open LLMs before selecting one for a project
Track model improvements across new releases and fine-tunes
Identify top-performing models for specific evaluation tasks

Notes

Use cases

Compare open LLMs before selecting one for a project
Track model improvements across new releases and fine-tunes
Identify top-performing models for specific evaluation tasks

Pros

Provides a single, standardized comparison point for many open models
Community-driven and transparent with reproducible evaluation methodology
Regularly updated with new models and benchmarks

Cons

Limited to a fixed set of benchmarks that may not reflect real-world use
Does not measure inference speed, cost, or deployment practicality
Rankings can be gamed by models optimized specifically for these tests

Indexed from awesome-llm and enriched against its public facts.

Pros

Provides a single, standardized comparison point for many open models
Community-driven and transparent with reproducible evaluation methodology
Regularly updated with new models and benchmarks

Cons

Limited to a fixed set of benchmarks that may not reflect real-world use
Does not measure inference speed, cost, or deployment practicality
Rankings can be gamed by models optimized specifically for these tests

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Uses1entry

O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 2mo ago

Pairs with1entry

P Apps Productivity low

Open LLMs

Various

📋 A list of open LLMs available for commercial use.

★ 12,783 updated 1y ago

Alternative to1entry

O OSS Framework medium

OpenAI Evals

Community

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

★ 18,584 updated 3mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →