O Open Source Frameworks medium

InfiBench

by Community

IInfiBench: Evaluating the Question-Answering Capabilities of Code LLMs

Visit Community View repo Submit your build →

OSS

InfiBench

Added 1 June 2026

Overview

InfiBench is a community-driven benchmark for evaluating the question-answering capabilities of code-focused large language models. It provides a standardized set of tasks and metrics to measure how well these models understand and respond to code-related queries.

Best for

Best for
Researchers and developers evaluating or comparing code LLMs on question-answering tasks

Use cases

Comparing the QA performance of different code LLMs on a common benchmark
Identifying strengths and weaknesses of a code LLM in answering programming questions
Validating improvements in a code LLM's question-answering abilities during development

Notes

Use cases

Comparing the QA performance of different code LLMs on a common benchmark
Identifying strengths and weaknesses of a code LLM in answering programming questions
Validating improvements in a code LLM’s question-answering abilities during development

Pros

Provides a focused, standardized evaluation for code LLM QA tasks
Community-driven, allowing for broad input and relevance
Helps developers and researchers make informed model comparisons

Cons

Limited to question-answering, not covering other code generation or understanding tasks
As a community project, may have less frequent updates or support than commercial benchmarks
Requires familiarity with the benchmark setup to interpret results correctly

Indexed from awesome-llm and enriched against its public facts.

Pros

Provides a focused, standardized evaluation for code LLM QA tasks
Community-driven, allowing for broad input and relevance
Helps developers and researchers make informed model comparisons

Cons

Limited to question-answering, not covering other code generation or understanding tasks
As a community project, may have less frequent updates or support than commercial benchmarks
Requires familiarity with the benchmark setup to interpret results correctly

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with1entry

O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 2mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →