Observability & Tracing

Station includes built-in OpenTelemetry (OTEL) support for complete execution observability. Every agent execution, LLM call, and tool invocation is automatically traced.

What Gets Traced

Component	Details Captured
Agent Executions	Complete timeline from start to finish
LLM Calls	Every OpenAI/Anthropic/Gemini API call with latency
MCP Tool Usage	Individual tool calls to AWS, databases, etc.
Database Operations	Query performance and data access patterns
GenKit Spans	Dotprompt execution, generation flow, model interactions

Quick Start with Jaeger

The fastest way to get tracing running locally:

# Start Jaeger
stn jaeger up

# Jaeger UI available at http://localhost:16686

Station automatically detects Jaeger and sends traces to http://localhost:4318.

Example Trace

incident_coordinator (18.2s)
├─ assess_severity (0.5s)
├─ delegate_logs_investigator (4.1s)
│  └─ __get_logs (3.2s)
├─ delegate_metrics_investigator (3.8s)
│  └─ __query_time_series (2.9s)
├─ delegate_change_detective (2.4s)
│  └─ __get_recent_deployments (1.8s)
└─ synthesize_findings (1.2s)

Configuration

Environment Variable (Recommended)

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
stn serve

Config File

# config.yaml
otel_endpoint: "http://localhost:4318"

MCP Client Configuration

When connecting MCP clients, include the OTEL endpoint:

{
  "mcpServers": {
    "station": {
      "command": "stn",
      "args": ["stdio"],
      "env": {
        "OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4318"
      }
    }
  }
}

Or with Claude Code CLI:

claude mcp add station -e OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 --scope user -- stn stdio

Tracing Backends

Station works with any OpenTelemetry-compatible backend.

Jaeger (Local Development)

# Built-in command
stn jaeger up

# Or manual Docker
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  -e COLLECTOR_OTLP_ENABLED=true \
  jaegertracing/all-in-one:latest

UI: http://localhost:16686

Grafana Tempo

# docker-compose.yml
services:
  tempo:
    image: grafana/tempo:latest
    command: ["-config.file=/etc/tempo.yaml"]
    volumes:
      - ./tempo.yaml:/etc/tempo.yaml
    ports:
      - "4318:4318"   # OTLP HTTP
      - "3200:3200"   # Tempo API

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Datadog APM

# Install Datadog Agent with OTLP support
DD_API_KEY=<your-key> DD_SITE="datadoghq.com" \
  DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_HTTP_ENDPOINT="0.0.0.0:4318" \
  bash -c "$(curl -L https://install.datadoghq.com/scripts/install_script_agent7.sh)"

# Configure Station
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Honeycomb

export OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
export OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=your-api-key"

AWS X-Ray

# Run AWS OTEL Collector
docker run -d \
  -p 4318:4318 \
  -e AWS_REGION=us-east-1 \
  amazon/aws-otel-collector:latest

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

New Relic

export OTEL_EXPORTER_OTLP_ENDPOINT=https://otlp.nr-data.net:4318
export OTEL_EXPORTER_OTLP_HEADERS="api-key=your-license-key"

Azure Monitor

# Use Azure Monitor OpenTelemetry Exporter
export APPLICATIONINSIGHTS_CONNECTION_STRING="InstrumentationKey=..."
export OTEL_EXPORTER_OTLP_ENDPOINT=https://dc.services.visualstudio.com/v2/track

Span Details

Station captures rich span information:

Agent Execution Span

{
  "name": "agent.execute",
  "attributes": {
    "agent.id": "21",
    "agent.name": "incident_coordinator",
    "agent.environment": "production",
    "task": "Investigate API timeout",
    "model": "gpt-4o-mini",
    "max_steps": 20
  }
}

LLM Call Span

{
  "name": "llm.generate",
  "attributes": {
    "model": "gpt-4o-mini",
    "provider": "openai",
    "input_tokens": 1250,
    "output_tokens": 380,
    "latency_ms": 1240
  }
}

Tool Call Span

{
  "name": "tool.call",
  "attributes": {
    "tool.name": "__get_logs",
    "tool.server": "datadog",
    "duration_ms": 320,
    "success": true
  }
}

Viewing Traces

Jaeger UI

Open http://localhost:16686
Select “station” from the Service dropdown
Click “Find Traces”
Click on a trace to see the full execution timeline

Filtering Traces

In Jaeger, use tags to filter:

agent.name=incident_coordinator
model=gpt-4o-mini
error=true

Production Setup

High-Volume Environments

For production, use sampling to reduce trace volume:

export OTEL_TRACES_SAMPLER=parentbased_traceidratio
export OTEL_TRACES_SAMPLER_ARG=0.1  # Sample 10% of traces

Secure Endpoints

# TLS endpoint
export OTEL_EXPORTER_OTLP_ENDPOINT=https://collector.example.com:4318
export OTEL_EXPORTER_OTLP_CERTIFICATE=/path/to/ca.crt

# With authentication
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer your-token"

Docker Deployment

# docker-compose.yml
services:
  station:
    image: ghcr.io/cloudshipai/station:latest
    environment:
      - OTEL_EXPORTER_OTLP_ENDPOINT=http://jaeger:4318
    depends_on:
      - jaeger

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"
    environment:
      - COLLECTOR_OTLP_ENABLED=true

Troubleshooting

No Traces Appearing

Check endpoint connectivity:

curl -v http://localhost:4318/v1/traces
# Should return 405 Method Not Allowed (POST required)

Verify environment variable:
```
echo $OTEL_EXPORTER_OTLP_ENDPOINT
```
Check Station logs:
```
stn logs | grep -i otel
```

Traces Missing Tool Calls

Ensure MCP servers are configured with tracing:

{
  "mcpServers": {
    "my-server": {
      "command": "my-mcp-server",
      "env": {
        "OTEL_EXPORTER_OTLP_ENDPOINT": "http://localhost:4318"
      }
    }
  }
}

High Latency in Traces

If traces show high latency:

Check network connectivity to tracing backend
Consider async export: traces are sent asynchronously by default
For high-volume, use sampling (see Production Setup)

Next Steps

Deployment Monitoring - Metrics and alerting
Scheduling - Automated agent runs
CloudShip Integration - Centralized observability

Get Started

Station CLI

Building Agents

Station Lattice

Deployment

Advanced

Observability & Tracing

Observability & Tracing

What Gets Traced

Quick Start with Jaeger

Example Trace

Configuration

Environment Variable (Recommended)

Config File

MCP Client Configuration

Tracing Backends

Jaeger (Local Development)

Grafana Tempo

Datadog APM

Honeycomb

AWS X-Ray

New Relic

Azure Monitor

Span Details

Agent Execution Span

LLM Call Span

Tool Call Span

Viewing Traces

Jaeger UI

Filtering Traces

Production Setup

High-Volume Environments

Secure Endpoints

Docker Deployment

Troubleshooting

No Traces Appearing

Traces Missing Tool Calls

High Latency in Traces

Next Steps

Get Started

Station CLI

Building Agents

Station Lattice

Deployment

Advanced

​Observability & Tracing

​What Gets Traced

​Quick Start with Jaeger

​Example Trace

​Configuration

​Environment Variable (Recommended)

​Config File

​MCP Client Configuration

​Tracing Backends

​Jaeger (Local Development)

​Grafana Tempo

​Datadog APM

​Honeycomb

​AWS X-Ray

​New Relic

​Azure Monitor

​Span Details

​Agent Execution Span

​LLM Call Span

​Tool Call Span

​Viewing Traces

​Jaeger UI

​Filtering Traces

​Production Setup

​High-Volume Environments

​Secure Endpoints

​Docker Deployment

​Troubleshooting

​No Traces Appearing

​Traces Missing Tool Calls

​High Latency in Traces

​Next Steps

Observability & Tracing

What Gets Traced

Quick Start with Jaeger

Example Trace

Configuration

Environment Variable (Recommended)

Config File

MCP Client Configuration

Tracing Backends

Jaeger (Local Development)

Grafana Tempo

Datadog APM

Honeycomb

AWS X-Ray

New Relic

Azure Monitor

Span Details

Agent Execution Span

LLM Call Span

Tool Call Span

Viewing Traces

Jaeger UI

Filtering Traces

Production Setup

High-Volume Environments

Secure Endpoints

Docker Deployment

Troubleshooting

No Traces Appearing

Traces Missing Tool Calls

High Latency in Traces

Next Steps