Giskard
by Community
๐ข Open-Source Evaluation & Testing library for LLM Agents
OSS
Giskard
Added 1 June 2026
Overview
Giskard is an open-source Python library for evaluating and testing LLM-based agents. It provides automated scanning for vulnerabilities like hallucinations, prompt injection, and bias, and integrates with existing CI/CD pipelines.
Best for
Best for
Python developers building LLM agents who need automated safety and quality testing.
Use cases
- Automated red-teaming of LLM agents for security flaws
- Regression testing LLM outputs across model versions
- Validating agent behavior against custom test suites
Notes
Giskard is an open-source Python library for evaluating and testing LLM-based agents. It provides automated scanning for vulnerabilities like hallucinations, prompt injection, and bias, and integrates with existing CI/CD pipelines.
5,414 stars on GitHub. Last updated 2026-05-29. Licensed Apache-2.0.
Use cases
- Automated red-teaming of LLM agents for security flaws
- Regression testing LLM outputs across model versions
- Validating agent behavior against custom test suites
Pros
- Comprehensive vulnerability scanning out of the box
- Active community with 5.4k GitHub stars
- Easy integration into Python testing workflows
Cons
- Limited to Python ecosystem only
- May require significant setup for complex agent architectures
- Documentation can be sparse for advanced use cases
Indexed from awesome-llm and enriched against its public facts.
Pros
- Comprehensive vulnerability scanning out of the box
- Active community with 5.4k GitHub stars
- Easy integration into Python testing workflows
Cons
- Limited to Python ecosystem only
- May require significant setup for complex agent architectures
- Documentation can be sparse for advanced use cases
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
LangChain
Community
The agent engineering platform.
Ragas
Community
Supercharge Your LLM Application Evaluations ๐
promptfoo
Community
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative config
Ragas
Community
Supercharge Your LLM Application Evaluations ๐
promptfoo
Community
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative config
lm-evaluation-harness
Community
A framework for few-shot evaluation of language models.
OpenAI Evals
Community
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.