O Open Source Frameworks medium

promptfoo

by Community

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative config

Visit Community View repo Submit your build →

OSS

promptfoo

Added 1 June 2026

#ci #ci-cd #cicd #evaluation #evaluation-framework #llm #llm-eval #llm-evaluation

Overview

promptfoo is a testing framework for evaluating prompts, agents, and RAG systems across multiple LLM providers including GPT, Claude, Gemini, and DeepSeek. It runs comparative benchmarks, red team tests, and vulnerability scans using declarative YAML configs with CLI and CI/CD support.

Best for

Best for
Teams building LLM applications who need systematic prompt validation and security testing before deployment

Use cases

Compare prompt performance across different LLM models before production
Automate security testing and adversarial input scanning for AI applications
Integrate prompt evaluation into CI/CD pipelines for continuous quality checks

Notes

21,784 stars on GitHub. Last updated 2026-06-01. Licensed MIT.

Use cases

Compare prompt performance across different LLM models before production
Automate security testing and adversarial input scanning for AI applications
Integrate prompt evaluation into CI/CD pipelines for continuous quality checks

Pros

Multi-model comparison built in, reducing vendor lock-in risk
Red teaming and vulnerability scanning included, not bolted on
Declarative config approach makes tests reproducible and version-controllable

Cons

Requires familiarity with YAML config syntax and CLI tooling
Testing scope limited to prompt and agent behavior, not full application integration
Costs scale with API calls to external LLM providers during test runs

Indexed from awesome-llm and enriched against its public facts.

Pros

Multi-model comparison built in, reducing vendor lock-in risk
Red teaming and vulnerability scanning included, not bolted on
Declarative config approach makes tests reproducible and version-controllable

Cons

Requires familiarity with YAML config syntax and CLI tooling
Testing scope limited to prompt and agent behavior, not full application integration
Costs scale with API calls to external LLM providers during test runs

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with2entries

promptfoo

Overview

Best for

Use cases

Notes

Use cases

Pros

Cons

Pairs with

LangChain

Ragas

OpenAI Evals

lm-evaluation-harness

Agentic Radar

AI Gateway

awesome-hallucination-detection

Awesome LLM Security

chatgpt-wrapper

Instructor

Language models are few-shot learners

MLflow

Agenta

Arthur Shield

Giskard

LangWatch

Opik

PromptPerfect

TensorZero

Get the free Developer’s Field Guide