O Open Source Frameworks medium

Giskard

by Community

🐢 Open-Source Evaluation & Testing library for LLM Agents

Visit Community View repo Submit your build →

OSS

Giskard

Added 1 June 2026

#agent-evaluation #ai-red-team #ai-security #ai-testing #fairness-ai #llm #llm-eval #llm-evaluation

Overview

Giskard is an open-source Python library for evaluating and testing LLM-based agents. It provides automated scanning for vulnerabilities like hallucinations, prompt injection, and bias, and integrates with existing CI/CD pipelines.

Best for

Best for
Python developers building LLM agents who need automated safety and quality testing.

Use cases

Automated red-teaming of LLM agents for security flaws
Regression testing LLM outputs across model versions
Validating agent behavior against custom test suites

Notes

5,414 stars on GitHub. Last updated 2026-05-29. Licensed Apache-2.0.

Use cases

Automated red-teaming of LLM agents for security flaws
Regression testing LLM outputs across model versions
Validating agent behavior against custom test suites

Pros

Comprehensive vulnerability scanning out of the box
Active community with 5.4k GitHub stars
Easy integration into Python testing workflows

Cons

Limited to Python ecosystem only
May require significant setup for complex agent architectures
Documentation can be sparse for advanced use cases

Indexed from awesome-llm and enriched against its public facts.

Pros

Comprehensive vulnerability scanning out of the box
Active community with 5.4k GitHub stars
Easy integration into Python testing workflows

Cons

Limited to Python ecosystem only
May require significant setup for complex agent architectures
Documentation can be sparse for advanced use cases

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Alternative to3entries

O OSS Framework medium

Ragas

Community

Supercharge Your LLM Application Evaluations 🚀

★ 14,186 updated 4mo ago

O OSS Framework medium

promptfoo

Community

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative config

★ 21,784 updated 1mo ago

O OSS Framework medium

OpenAI Evals

Community

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

★ 18,584 updated 3mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →