> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lyzr.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Observability

> Distributed tracing, metrics, and dashboards for your AI agents.

Control Plane's observability layer gives you full visibility into what your agents do, how long it takes, and what it costs for every run, in production and in CI.

## Trace collection

Traces are collected via the Control Plane SDK and routed through the Control Plane Collector (an OpenTelemetry Collector configured for agent workloads).

### Automatic instrumentation

When you call `langship.init()`, Control Plane patches the following libraries automatically:

| Library       | What's traced                                  |
| ------------- | ---------------------------------------------- |
| `openai`      | Every chat completion and embedding call       |
| `anthropic`   | Every message call                             |
| `langchain`   | Agent loops, chains, tools, retrievers, memory |
| `llama_index` | Query engines, retrievers, LLM calls           |
| `cohere`      | Chat and embed calls                           |

No code changes needed beyond the `init()` call.

### Manual spans

For code that Control Plane doesn't auto-instrument, add spans manually:

```python theme={null}
with langship.span("my-custom-step", input={"query": query}) as span:
    result = my_custom_function(query)
    span.set_output(result)
```

Spans support arbitrary attributes:

```python theme={null}
span.set_attribute("customer_id", customer_id)
span.set_attribute("model_version", "v2.1")
```

## Trace viewer

The trace viewer shows the full span tree for any run:

* **Timeline view**: spans laid out on a horizontal time axis; see overlapping calls
* **Tree view**: hierarchical view of parent/child spans
* **Detail panel**: click any span to see full input, output, model metadata, token counts, and cost

### Filtering and search

Filter runs by:

* Project and environment
* Date range
* Status (success / failure / error)
* Tag or attribute value
* Latency (runs slower than N ms)
* Cost (runs above \$X)

Search full-text across all span inputs and outputs.

## Metrics

Control Plane aggregates per-project metrics over time:

| Metric                      | Description                                                 |
| --------------------------- | ----------------------------------------------------------- |
| **P50 / P95 / P99 latency** | Response time percentiles per run                           |
| **Token usage**             | Input and output tokens, broken down by model               |
| **Cost**                    | Estimated cost per run and per day (based on model pricing) |
| **Error rate**              | Percentage of runs that errored                             |
| **Tool call rate**          | Average number of tool calls per run                        |
| **Eval pass rate**          | Percentage of runs that passed each evaluator               |

Metrics are available in the dashboard and via the API for export to Grafana or other dashboards.

## Alerting

Set up alerts on any metric:

1. Go to your project → **Alerts** → **New Alert**
2. Choose a metric (e.g., P95 latency, error rate, cost per day)
3. Set a threshold and a notification channel (email, Slack webhook, PagerDuty)

Alerts evaluate every 5 minutes.

## Exporting to external backends

Control Plane's collector can forward traces to any OTLP-compatible backend in parallel with Control Plane's own storage.

In `collector-config.yaml` (included in the Docker Compose setup):

```yaml theme={null}
exporters:
  jaeger:
    endpoint: "jaeger:14250"
    tls:
      insecure: true
  grafana:
    endpoint: "https://otlp-gateway-prod-us-central-0.grafana.net/otlp"
    headers:
      authorization: "Basic ${GRAFANA_TOKEN}"

service:
  pipelines:
    traces:
      exporters: [langship, jaeger, grafana]
```

## Session replay

Control Plane records the full conversation history for multi-turn agents. In the run detail view, click **Session Replay** to step through the conversation turn by turn, with the full trace visible for each turn.

## Cost tracking

Control Plane estimates cost for every LLM call using current model pricing. Costs are visible:

* Per span (each LLM call)
* Per run (total)
* Per project per day (aggregate)
* In eval results (how much did this eval suite cost to run?)

Set a **cost budget** alert to notify you when daily costs exceed a threshold.

## SDK API reference

```python theme={null}
# Initialize
langship.init(
    endpoint="https://langship.yourcompany.com",
    api_key="...",
    project="my-agent",
    environment="production",   # default: "default"
    sample_rate=1.0,            # trace 100% of runs; reduce for high-volume agents
)

# Manual span
with langship.span("step-name", input={...}) as span:
    output = do_work()
    span.set_output(output)
    span.set_attribute("key", "value")

# Log an event on the current span
langship.log("retrieved 3 documents", level="info")

# Tag the current run
langship.tag("customer_tier", "enterprise")

# Flush before exit (important in short-lived scripts)
langship.flush()
```