The Agentic Harness is an alternative execution engine for Station agents that provides advanced capabilities beyond the standard Genkit-based execution.
Overview
Add harness: agentic to any agent’s dotprompt file to enable:
- Manual agentic loop - Step-by-step control over agent execution
- Doom loop detection - Prevents agents from getting stuck in repetitive patterns
- Context compaction - Automatically summarizes history when approaching context limits
- Git integration - Auto-branch creation and commit management
- Workspace isolation - Sandboxed file system access
- Built-in tools - File and bash tools that work independently of MCP
When to Use
| Use Case | Harness |
|---|
| Simple queries, quick responses | Default (Genkit) |
| Long-running coding tasks | agentic |
| Multi-step file operations | agentic |
| Tasks requiring git branches | agentic |
| Complex debugging/investigation | agentic |
Agent Configuration
Enable the harness in your agent’s dotprompt frontmatter:
---
model: anthropic/claude-sonnet-4-20250514
harness: agentic
harness_config:
max_steps: 50
doom_loop_threshold: 3
timeout: 10m
sandbox: # Optional: isolated execution environment
mode: docker # host | docker | e2b (experimental)
image: python:3.11-slim
tools:
- read
- write
- edit
- bash
- glob
- grep
---
You are a code analysis agent. Analyze the codebase and suggest improvements.
Agent-Level Options
| Option | Default | Description |
|---|
max_steps | 50 | Maximum tool call iterations |
doom_loop_threshold | 3 | Consecutive identical calls to trigger loop detection |
timeout | 10m | Maximum execution time |
sandbox | null | Isolated execution environment (see below) |
Sandbox Configuration
The sandbox option under harness_config controls WHERE tools execute:
harness_config:
sandbox:
mode: docker # Execution mode
image: python:3.11-slim # Docker image (for docker mode)
network: false # Disable network access
timeout: 5m # Per-command timeout
memory: 4g # Memory limit
cpu: 2 # CPU limit
environment: # Environment variables
PYTHONPATH: /workspace
Sandbox Modes:
| Mode | Description | File Persistence |
|---|
host | Tools execute directly on host machine | Yes |
docker | Tools execute in Docker containers | Yes (volume mounted) |
e2b | Tools execute in cloud VMs | No (experimental) |
Docker mode is recommended for production. Files persist across container restarts via volume mounting.
E2B mode is experimental - data doesn’t persist between sandbox destroys.
Global Configuration
Configure harness defaults in config.yaml. Running stn init sets sensible defaults:
harness:
workspace:
path: ./workspace
mode: host # "host" or "sandbox"
compaction:
enabled: true
threshold: 0.85 # Compact at 85% of context window
protect_tokens: 40000 # Keep last N tokens from compaction
git:
auto_branch: true
branch_prefix: agent/
auto_commit: false
require_approval: true
workflow_branch_strategy: shared
nats:
enabled: true
kv_bucket: harness-state
object_bucket: harness-files
max_file_size: 100MB
ttl: 24h
permissions:
external_directory: deny
Key Features
Doom Loop Detection
Detects when an agent is stuck repeating the same action:
# In dotprompt frontmatter
harness_config:
doom_loop_threshold: 3 # Trigger after 3 identical tool calls
When detected, the harness interrupts the loop and prompts the agent to try a different approach.
Context Compaction
Automatically summarizes conversation history when approaching context limits:
harness:
compaction:
enabled: true
threshold: 0.85 # Start compacting at 85% of window
protect_tokens: 40000 # Never compact the last 40k tokens
The compactor uses the same model to create a summary, preserving important context while freeing up space for new interactions.
Git Integration
Automatic branch creation for agent work:
harness:
git:
auto_branch: true # Create branches automatically
branch_prefix: agent/ # Branch naming: agent/task-name-timestamp-id
auto_commit: false # Require explicit commits
require_approval: true # Human approval before push
When enabled, the harness:
- Creates a new branch when execution starts
- Tracks all file changes
- Can commit changes with generated messages
- Supports push with approval workflow
Workspace Isolation
Control where agents can read/write files:
harness:
workspace:
path: ./workspace # Root directory for agent operations
mode: host # "host" for direct access, "sandbox" for isolation
permissions:
external_directory: deny # Block access outside workspace
The harness provides built-in tools that work independently of MCP servers:
| Tool | Description | Example |
|---|
read | Read file contents | read(path: "src/main.go") |
write | Write file contents | write(path: "out.txt", content: "...") |
edit | String replacement editing | edit(path: "file.go", old: "foo", new: "bar") |
bash | Execute shell commands | bash(command: "ls -la") |
glob | Find files by pattern | glob(pattern: "**/*.go") |
grep | Search file contents | grep(pattern: "TODO", path: "src/") |
git_status | Get git status | git_status() |
git_diff | Get git diff | git_diff() |
git_log | Get recent commits | git_log(count: 10) |
Fine-grained control over tool capabilities:
harness:
permissions:
bash:
allow_write: false # Read-only commands
allowed_commands: # Whitelist
- ls
- cat
- grep
- find
blocked_commands: # Blacklist
- rm
- sudo
Workflow Integration
Harness agents work seamlessly with Station workflows:
# workflow.yaml
id: code-review-pipeline
name: Code Review Pipeline
states:
- name: analyze
type: agent
agent: code-analyzer # Has harness: agentic
transition: report
- name: report
type: agent
agent: report-generator
transition: end
Shared Git Branches in Workflows
When multiple agents collaborate on the same codebase:
harness:
git:
workflow_branch_strategy: shared # All workflow steps share one branch
Example: Code Review Agent
---
model: anthropic/claude-sonnet-4-20250514
harness: agentic
harness_config:
max_steps: 100
timeout: 15m
tools:
- read
- glob
- grep
- bash
---
You are a senior code reviewer. Analyze the codebase for:
- Code quality issues
- Security vulnerabilities
- Performance problems
- Missing tests
Use glob to find relevant files, read to examine them, and grep to search for patterns.
Provide a detailed report with specific line numbers and suggested fixes.
Example: Refactoring Agent
---
model: anthropic/claude-sonnet-4-20250514
harness: agentic
harness_config:
max_steps: 200
doom_loop_threshold: 5
tools:
- read
- write
- edit
- glob
- grep
- git_status
- git_diff
- bash
---
You are a refactoring specialist. Your task is to:
1. Understand the current code structure
2. Make targeted improvements
3. Ensure tests still pass
4. Commit changes with clear messages
Always run tests after changes: `bash(command: "go test ./...")`
Check your changes: `git_diff()`
Observability
Harness executions are fully traced with OpenTelemetry:
# View traces in Jaeger
stn jaeger up
# Open http://localhost:16686
Traces include:
- Each agentic loop iteration
- Tool calls with inputs/outputs
- Doom loop detection events
- Compaction events
- Git operations
Testing
Run harness tests:
# Unit tests
go test ./pkg/harness/... -v
# E2E tests with real LLM
HARNESS_E2E_TEST=1 go test ./pkg/harness/... -v -run "E2E" -timeout 5m
Next Steps