Lattice Architecture
This document provides a technical deep-dive into Station Lattice’s architecture, including message flows, NATS subjects, and internal components.System Components
Core Components
Client (internal/lattice/client.go)
The NATS connection wrapper that handles:
- Connection lifecycle management
- Automatic reconnection with backoff
- TLS and NKey authentication
- Subscription management
Registry (internal/lattice/registry.go)
JetStream KV-backed registry for stations and agents:
| Bucket | Key Pattern | Value |
|---|---|---|
stations | {station_id} | Station metadata JSON |
agents | {station_id}.{agent_name} | Agent metadata JSON |
Presence (internal/lattice/presence.go)
Heartbeat system for station health monitoring:
Router (internal/lattice/router.go)
Capability-based agent routing:
Invoker (internal/lattice/invoker.go)
Remote agent invocation via request-reply:
Work Store (internal/lattice/work/store.go)
JetStream-backed async work tracking:
NATS Subject Conventions
Core Subjects
| Subject | Type | Purpose |
|---|---|---|
lattice.station.{id}.heartbeat | Pub | Station heartbeat |
lattice.station.{id}.status | Pub | Station status changes |
lattice.agent.register | Pub | Agent registration |
lattice.agent.deregister | Pub | Agent deregistration |
Request-Reply Subjects
| Subject | Purpose |
|---|---|
lattice.invoke.{station}.{agent} | Direct agent invocation |
lattice.invoke.capability.{cap} | Capability-based invocation |
Work Queue Subjects
| Subject | Purpose |
|---|---|
lattice.work.assign.{station} | Work assignment to station |
lattice.work.{id}.status | Work status updates |
lattice.work.{id}.result | Work completion result |
lattice.work.{id}.cancel | Work cancellation request |
Message Flows
Station Registration
Synchronous Invocation
Async Work Assignment
State Machines
Station Lifecycle
Work Item Lifecycle
JetStream Configuration
Streams
KV Buckets
Error Handling
Retry Policies
| Operation | Max Retries | Backoff |
|---|---|---|
| NATS connect | Infinite | Exponential (1s-30s) |
| KV operations | 3 | Linear (100ms) |
| Invocations | 0 (caller decides) | N/A |
| Work assignment | 3 | Exponential (1s-10s) |
Error Types
Performance Considerations
Scaling Limits
| Component | Soft Limit | Hard Limit |
|---|---|---|
| Stations per mesh | 100 | 1000 |
| Agents per station | 50 | 500 |
| Concurrent invocations | 1000 | 10000 |
| Work items (pending) | 10000 | 100000 |
| Message size | 1MB | 8MB |
Optimization Tips
- Use capability routing instead of direct station targeting when possible
- Batch heartbeats if running many stations on same host
- Set appropriate timeouts - don’t use default 60s for fast operations
- Monitor JetStream storage - purge old work items regularly
- Use async invocation for long-running tasks (>5s)

