O Open Source Frameworks medium

Chain-of-Thought Hub

by Community

Benchmarking large language models' complex reasoning ability with chain-of-thought prompting

Visit Community View repo Submit your build →

OSS

Added 1 June 2026

Overview

Chain-of-Thought Hub is a community-maintained benchmarking framework for evaluating large language models on complex reasoning tasks using chain-of-thought prompting. It provides datasets, prompts, and evaluation scripts in Jupyter Notebook format to measure and compare model performance.

Best for

Best for
Researchers and developers evaluating LLM reasoning capabilities with chain-of-thought prompting

Use cases

Benchmark LLM reasoning abilities with chain-of-thought prompts
Compare multiple models on standardized reasoning tasks
Reproduce and extend research on chain-of-thought prompting

Notes

2,773 stars on GitHub. Last updated 2024-08-04. Licensed MIT.

Use cases

Benchmark LLM reasoning abilities with chain-of-thought prompts
Compare multiple models on standardized reasoning tasks
Reproduce and extend research on chain-of-thought prompting

Pros

Open source with a focused, well-defined scope
Community-driven with active development and 2,773 stars
Provides ready-to-use datasets and evaluation code

Cons

Jupyter Notebook format limits production deployment
Primarily a benchmarking tool, not a runtime or inference framework
Requires manual setup and model API keys or local models

Indexed from awesome-llm and enriched against its public facts.

Pros

Open source with a focused, well-defined scope
Community-driven with active development and 2,773 stars
Provides ready-to-use datasets and evaluation code

Cons

Jupyter Notebook format limits production deployment
Primarily a benchmarking tool, not a runtime or inference framework
Requires manual setup and model API keys or local models

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Alternative to2entries

O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 2mo ago

O OSS Framework medium

OpenAI Evals

Community

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

★ 18,584 updated 3mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →