O Open Source Frameworks medium

instruct-eval

by Community

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

Visit Community View repo Submit your build →

OSS

instruct-eval

Added 1 June 2026

#instruct-tuning #llm

Overview

Community framework for quantitative evaluation of instruction-tuned models (e.g., Alpaca, Flan-T5) on held-out tasks. It provides a standardized benchmarking setup to measure model performance on unseen instructions.

Best for

Best for
Researchers and developers who need a simple, standardized way to evaluate instruction-tuned language models

Use cases

Evaluate instruction-tuned models on a held-out task set
Benchmark custom instruction-tuned models against baselines
Compare output quality across different instruction-tuned architectures

Notes

553 stars on GitHub. Last updated 2024-03-10. Licensed Apache-2.0.

Use cases

Evaluate instruction-tuned models on a held-out task set
Benchmark custom instruction-tuned models against baselines
Compare output quality across different instruction-tuned architectures

Pros

Lightweight and focused solely on evaluation
Open source with community support
Provides a consistent, reproducible evaluation pipeline

Cons

Limited to instruction-tuned models only
May not cover all evaluation metrics needed for production
Requires manual integration with specific model formats

Indexed from awesome-llm and enriched against its public facts.

Pros

Lightweight and focused solely on evaluation
Open source with community support
Provides a consistent, reproducible evaluation pipeline

Cons

Limited to instruction-tuned models only
May not cover all evaluation metrics needed for production
Requires manual integration with specific model formats

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Alternative to2entries

O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 2mo ago

O OSS Framework medium

OpenAI Evals

Community

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

★ 18,584 updated 3mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →