# Agent Observability and Tracing

> Why agents need observability beyond app logs, how OpenTelemetry GenAI semantic conventions model agent runs as traces, key signals to capture, and a verified tooling landscape.

Category: Guide · Updated: 2026-06-15 · Tags: observability, tracing, opentelemetry, agents, debugging, evals
Canonical: https://changegamer.ai/resources/agent-observability

Standard application logging — a flat stream of timestamped lines — cannot answer "why did the agent do that?" Agent runs are non-deterministic, multi-step, and branching. A single run may spawn dozens of LLM calls, tool calls, and sub-agent delegations. Without structured tracing, debugging is guesswork and cost attribution is impossible.

## The span/trace model maps naturally to agent runs

OpenTelemetry's span/trace model fits agents well:

- **Trace** = one complete agent run, identified by a stable `trace_id` propagated across all child operations, including sub-agents. The same `trace_id` threading through every span is what enables cross-agent debugging.
- **Span** = one discrete operation: an LLM call, a tool call, a sub-agent invocation, a retrieval step. Spans are nested (parent/child) to form the full trace tree.

This maps directly to what agent builders need: a tree view of every decision, the inputs and outputs at each node, latency per step, and a single ID to correlate across services.

## OpenTelemetry GenAI semantic conventions

The OpenTelemetry GenAI SIG (formed April 2024) defines vendor-neutral attribute names, span types, events, and metrics for LLM and agent workloads. As of June 2026 the conventions have **Development** status (formerly called experimental) — the attribute names carry a stability opt-in flag (`gen_ai_latest_experimental`) and may still change, but major observability vendors already support them.

Coverage breaks into four areas:

- **Client spans** — LLM calls and retrieval steps (`gen_ai.*` attributes: model name, token usage, finish reason).
- **Agent spans** — agent invocations and workflows; each tool call, LLM step, and retrieval becomes a child span.
- **Events** — prompt and completion bodies captured as span events (off by default for PII safety).
- **Metrics** — token usage and latency histograms.

Key `gen_ai.*` attributes: `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.tool.name`.

## Key signals to capture per agent run

- Full trace tree: every LLM call, tool call, and sub-agent invocation as a span.
- Per-span token usage and derived cost (input + output tokens × model price).
- Latency per span and end-to-end trace duration.
- Tool-call inputs and outputs (redact PII before logging).
- Errors and retries with the original exception attached to the failing span.
- A stable `trace_id` propagated into all sub-agent calls (see /resources/agentic-security-checklist, section 11 — logging and auditability).

## Tooling landscape

### Open-source / vendor-neutral

- **Langfuse** — open-source LLM engineering platform with an OTLP ingestion endpoint (`/api/public/otel`); accepts OTel traces and aims to comply with GenAI semantic conventions. Self-host or cloud. Source: github.com/langfuse/langfuse.
- **Arize Phoenix** — open-source observability and evaluation platform; OTel-native, accepts traces over OTLP, auto-instruments LangChain, LlamaIndex, OpenAI, Anthropic, and others via OpenInference. Runs fully local (no API key required). Source: github.com/Arize-ai/phoenix.
- **OpenLLMetry (Traceloop)** — OTel instrumentations for LLM providers and vector DBs; a Traceloop SDK wrapper emits standard OTel data you route to any OTel-compatible backend (Langfuse, Datadog, Grafana Tempo, etc.). Apache 2.0. Source: github.com/traceloop/openllmetry.
- **Logfire (Pydantic)** — OTel-based observability platform with first-class Python and Pydantic AI integration; tracks token usage, cost, and tool calls; ships built-in inside Pydantic AI (see /resources/agent-frameworks-compared). Source: github.com/pydantic/logfire.

### Framework-native

- **LangSmith (LangChain)** — paired tracing and evaluation SaaS for LangChain and LangGraph; supports full end-to-end OTel ingestion so you can route spans to LangSmith and other backends simultaneously. Paid product; free tier available.
- **OpenAI Agents SDK tracing** — built-in trace processor that captures agent runs, handoffs, and tool calls; exports to the OpenAI Traces dashboard by default. Custom `TracingProcessor` implementations let you redirect spans to any OTel-compatible backend.

## Evals + observability connection

Traces are the raw material for both offline eval and online monitoring. Stored traces feed evaluation datasets (sample a slice of production runs → score with an LLM judge or deterministic metric). Online monitoring alerts on anomalous patterns in the live trace stream. See /resources/evaluating-ai-agents for eval methodology and /resources/agent-frameworks-compared for framework-native tracing details.

## Verified sources

- OTel GenAI semantic conventions — agent and framework spans (Development status): https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/
- OTel GenAI semantic conventions — generative client AI spans: https://opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-spans/
- OTel blog — Inside the LLM Call: GenAI Observability (2026): https://opentelemetry.io/blog/2026/genai-observability/
- OTel blog — AI Agent Observability: Evolving Standards and Best Practices (2025): https://opentelemetry.io/blog/2025/ai-agent-observability/
- Langfuse OTel integration docs: https://langfuse.com/integrations/native/opentelemetry
- Langfuse GitHub: https://github.com/langfuse/langfuse
- Arize Phoenix docs: https://arize.com/docs/phoenix
- Arize Phoenix GitHub: https://github.com/Arize-ai/phoenix
- OpenLLMetry GitHub (Traceloop): https://github.com/traceloop/openllmetry
- Logfire (Pydantic) AI observability docs: https://logfire.pydantic.dev/docs/ai-observability/
- Logfire GitHub: https://github.com/pydantic/logfire
- LangSmith OTel support announcement: https://www.langchain.com/blog/end-to-end-opentelemetry-langsmith
- OpenAI Agents SDK — Tracing docs: https://openai.github.io/openai-agents-python/tracing/