Opik
by Community
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
OSS
Opik
Added 1 June 2026
Overview
Opik is a Python framework for tracing, evaluating, and monitoring LLM applications, RAG systems, and agentic workflows. It captures detailed execution traces, runs automated evaluations against defined metrics, and provides dashboards for production visibility. Built as an open-source project with 19k+ GitHub stars.
Best for
Best for
Python developers building production LLM systems who need observability and systematic evaluation.
Use cases
- Debug LLM application behavior by inspecting full execution traces
- Evaluate RAG retrieval and generation quality with automated test suites
- Monitor agentic workflows in production for performance and failure patterns
Notes
Opik is a Python framework for tracing, evaluating, and monitoring LLM applications, RAG systems, and agentic workflows. It captures detailed execution traces, runs automated evaluations against defined metrics, and provides dashboards for production visibility. Built as an open-source project with 19k+ GitHub stars.
19,417 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.
Use cases
- Debug LLM application behavior by inspecting full execution traces
- Evaluate RAG retrieval and generation quality with automated test suites
- Monitor agentic workflows in production for performance and failure patterns
Pros
- Comprehensive tracing captures full context across LLM calls and tool interactions
- Automated evaluation framework reduces manual testing overhead
- Open-source with active community support
Cons
- Python-only, not suitable for non-Python LLM stacks
- Requires integration work to instrument existing applications
- Dashboard and evaluation features depend on proper trace instrumentation
Indexed from awesome-llm and enriched against its public facts.
Pros
- Comprehensive tracing captures full context across LLM calls and tool interactions
- Automated evaluation framework reduces manual testing overhead
- Open-source with active community support
Cons
- Python-only, not suitable for non-Python LLM stacks
- Requires integration work to instrument existing applications
- Dashboard and evaluation features depend on proper trace instrumentation
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
promptfoo
Community
Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative config
Ragas
Community
Supercharge Your LLM Application Evaluations 🚀
Evidently
Community
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
LangWatch
Community
The platform for LLM evaluations and AI agent testing