O Open Source Frameworks medium

Opik

by Community

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

Visit Community View repo Submit your build →

OSS

Opik

Added 1 June 2026

#evaluation #hacktoberfest #hacktoberfest2025 #langchain #llama-index #llm #llm-evaluation #llm-observability

Overview

Opik is a Python framework for tracing, evaluating, and monitoring LLM applications, RAG systems, and agentic workflows. It captures detailed execution traces, runs automated evaluations against defined metrics, and provides dashboards for production visibility. Built as an open-source project with 19k+ GitHub stars.

Best for

Best for
Python developers building production LLM systems who need observability and systematic evaluation.

Use cases

Debug LLM application behavior by inspecting full execution traces
Evaluate RAG retrieval and generation quality with automated test suites
Monitor agentic workflows in production for performance and failure patterns

Notes

19,417 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.

Use cases

Debug LLM application behavior by inspecting full execution traces
Evaluate RAG retrieval and generation quality with automated test suites
Monitor agentic workflows in production for performance and failure patterns

Pros

Comprehensive tracing captures full context across LLM calls and tool interactions
Automated evaluation framework reduces manual testing overhead
Open-source with active community support

Cons

Python-only, not suitable for non-Python LLM stacks
Requires integration work to instrument existing applications
Dashboard and evaluation features depend on proper trace instrumentation

Indexed from awesome-llm and enriched against its public facts.

Pros

Comprehensive tracing captures full context across LLM calls and tool interactions
Automated evaluation framework reduces manual testing overhead
Open-source with active community support

Cons

Python-only, not suitable for non-Python LLM stacks
Requires integration work to instrument existing applications
Dashboard and evaluation features depend on proper trace instrumentation

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with1entry

O OSS Framework medium

LangChain

Community

The agent engineering platform.

★ 138,234 updated 1mo ago

Alternative to1entry

O OSS Framework medium

promptfoo

Community

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative config

★ 21,784 updated 1mo ago

Alternatives4entries

O OSS Framework medium

Agenta

Community

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

★ 4,171 updated 1mo ago

O OSS Framework medium

Arize-Phoenix

Community

Arize Phoenix: Open Source AI Development Platform

O OSS Framework medium

Evidently

Community

Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.

★ 7,561 updated 2mo ago

O OSS Framework medium

LangSmith

Community

Complete AI agent and LLM observability platform with tracing and real-time monitoring. Debug agents, find failures fast, and track costs and latency.

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →