O Open Source Frameworks medium

WHOOPS!

by Community

Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

Visit Community View repo Submit your build →

OSS

WHOOPS!

Added 1 June 2026

Overview

WHOOPS! is a benchmark from the community for evaluating vision-and-language models on synthetic and compositional images. It tests common sense reasoning by presenting model-generated scenes that break typical real-world expectations.

Best for

Best for
Researchers evaluating vision-language models on common sense and compositional reasoning

Use cases

Benchmarking vision-language models on common sense violations
Evaluating compositional understanding in synthetic scenes
Testing model robustness to atypical image compositions

Notes

Use cases

Benchmarking vision-language models on common sense violations
Evaluating compositional understanding in synthetic scenes
Testing model robustness to atypical image compositions

Pros

Focuses on challenging common sense reasoning, a key weakness in many models
Synthetic images allow precise control over compositional elements
Community-driven benchmark fosters open research

Cons

Synthetic images may not transfer perfectly to real-world scenarios
Limited to vision-language tasks, not multi-modal beyond those
Narrow scope on common sense violations may not cover broader model capabilities

Indexed from awesome-llm and enriched against its public facts.

Pros

Focuses on challenging common sense reasoning, a key weakness in many models
Synthetic images allow precise control over compositional elements
Community-driven benchmark fosters open research

Cons

Synthetic images may not transfer perfectly to real-world scenarios
Limited to vision-language tasks, not multi-modal beyond those
Narrow scope on common sense violations may not cover broader model capabilities

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with2entries

O OSS Framework medium

OpenAI Evals

Community

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

★ 18,584 updated 3mo ago

O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 2mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →