Best-for list
Best AI Observability Tools
You cannot debug what you cannot see. These tools log token usage, tool calls, latency, and errors. Langfuse is open-source; LangSmith is commercial but pairs with LangGraph; Arize is for production inference.
The picks
Ranked by fit, not by popularity. Each entry links to its full Directories page.
- 1O OSS
Langfuse
by Langfuse
Open-source LLM observability with tracing, evaluation, and cost tracking.
Langfuse logs every LLM call, tool invocation, and chain branch. Query traces, evaluate outputs against ground truth, track costs per model. Self-hosted or managed. No vendor lock-in.
Full entry - 2
langsmith
LangChain observability platform with evaluation, debugging, and feedback loops.
LangSmith is the observability layer for LangChain and LangGraph. Built-in evaluation, replay capability, feedback collection from users. Commercial but owned by LangChain team, so integration is seamless.
- 3
openobserve
Log aggregation and analytics for structured logging from agents.
OpenObserve handles unstructured logs from your agents. Parse, index, visualize. Lower-level than LLM-specific tools but useful for operational debugging across your whole stack.
- 4
prometheus
Metrics collection and time-series database for agent performance.
Prometheus scrapes metrics from your agent code: latency, tokens, error rates. Plug into Grafana for dashboards. Standard infrastructure tool, not LLM-specific, but essential for production.
- 5
jaeger
Distributed tracing for microservices, including LLM service calls.
Jaeger traces requests across services. Useful when your agent makes external API calls. See the critical path: is latency in the LLM or in your database?
- 6
datadog
Commercial APM with AI-specific insights and log correlation.
Datadog is the enterprise choice. Full observability: logs, metrics, traces, synthetic monitoring. LLM plugin for token tracking. Worth it if you already use Datadog.
Run every pick on one platform.
Enterprise DNA ships with Langfuse integration built-in. Every agent run logs to Langfuse. See project performance, evaluate quality, track cost per agent.
Get the Full Reference List
A printable card with every pick, rank, and rationale — ready to save as a PDF.
Enter your email. We send one useful update per week. Unsubscribe any time.
In the print dialog, choose "Save as PDF" as the destination.
Other lists
More curated picks across the index.
Best MCP Servers for Developers
The top MCP servers that give developers practical superpowers for code exploration, testing, debugging, and CI/CD integration without context-switching away from their editor.
See the list Best forBest AI Coding Agents
AI coding agents ranked by autonomy, repo understanding, and real-world engineering capability. Covers terminal-native agents, IDE-integrated agents, and purpose-built code builders.
See the list Best forBest RAG Frameworks
Production-ready RAG frameworks ranked for building retrieval-augmented generation systems at scale. Focus on frameworks that handle data ingestion, retrieval optimization, and evaluation without requiring manual ground truth.
See the list