Agent activity as OTel spans. The persistence layer is the legible thing: every agent message, handoff, and task transition becomes a queryable trace any OTel-native observability tool can render. A2A is the current supported wire format. The session derivation generalizes to any transport-keyed channel - GitHub issue, Slack thread, Linear ticket - so the same trace shape holds when other wire formats land.
otel-a2a-relay is the canonical name (repo, package, protocol doc). o2r is the dictation-friendly shortname used in CLI entrypoints (o2r, o2r-harness), span identifiers (service.name=o2r, the relay's agent.name), and prose below.
A real session, animated. The relay is the magenta hub at the center; A and B are the leaves. Each particle is one A2A hop, drawn from a real Phoenix span; arcs above and below the chord let outbound and return hops cross visibly instead of overdrawing. Generate your own with make demo && make gif CTX=demo. Detailed below in Animated session topology.
Agent peers coordinate through this relay. Every message becomes one or more OTel spans, exported via OTLP/HTTP to whatever you've pointed OTEL_EXPORTER_OTLP_ENDPOINT at. The trace IS the operations view, no derived state needed.
The repo ships two complementary coordination shapes that share the same span schema:
- A2A wire format - the relay translates JSON-RPC 2.0 over HTTP into traces, including a deterministic
sha256(<repo>:<issue>)session ID for any GitHub-issue-rooted coordination. - Agent Channel - a Postgres-backed coordination channel with 4-character dictatable IDs, an append-only event log (
spec/state/status/comms/log), handoff and liveness rules, and a self-describing onboarding endpoint. Spec:docs/channels-protocol.md. Reusable implementation:channels/(theotel-a2a-relay-channelspackage). Every channel event also emits one OTel span, so the same trace view covers both shapes. Origin: coilysiren/coilyco-ai#24.
- Currently supported wire format: A2A (JSON-RPC 2.0 over HTTP, AgentCards,
message/send,tasks/get,tasks/cancel). - Persistence format: OTel spans, OpenInference attributes for Phoenix's Agent Graph and Sessions views.
- Trace propagation: W3C
traceparentend-to-end. Client β relay β peer is one trace. - Channel derivation: deterministic
session.idfrom any transport key (GitHub issue today, Slack thread / Linear ticket / file-on-disk by the same pattern). - Default visualizer: Phoenix. Anything OTLP-native works.
This repository is a uv workspace with a backend-agnostic core and per-backend extensions. Each member is its own publishable Python package; cross-package deps are wired through the workspace.
otel-a2a-relay-core- the relay HTTP server,tracing.bootstrap(), the echo A2A peer, the in-memory task store. No backend coupling. PointOTEL_EXPORTER_OTLP_ENDPOINTat any OTLP/HTTP collector.otel-a2a-relay-channels- the Agent Channel coordination layer. FastAPI router + Postgres schema + Pydantic models for the protocol indocs/channels-protocol.md. Pool and auth are caller-injected so any FastAPI app can mount it.otel-a2a-relay-arize-phoenix- Phoenix-side validation harness, REST/GraphQL query helpers, animated topology GIF renderer, annotation+dataset bootstrapper,make viewCLI.otel-a2a-relay-tempo-grafana- Tempo-side bootstrap helper, harness probe, dockerized Tempo+Grafana stack with provisioned datasource and a LUCA-flow Grafana dashboard.luca-flow- the AURORA microsite multi-agent demo, backend-agnostic.
Pick a backend (or run both side by side - they coexist on different ports). All paths work identically through core's tracing.bootstrap().
uv sync --all-packages
make phoenix-up # docker compose, always-on (or `make phoenix-fg` for foreground)
make phoenix-bootstrap # one-time annotation configs + datasets
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:6006 make luca-demo
python -m webbrowser http://localhost:6006 # Phoenix Sessions tabuv sync --all-packages
make tempo-up # docker compose Tempo + Grafana
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 make luca-demo
python -m webbrowser http://localhost:3000/d/luca-flow/luca-flowtracing.bootstrap() ships standard OTLP/HTTP - point it at Honeycomb, Datadog, or any OTel-native backend by setting OTEL_EXPORTER_OTLP_ENDPOINT. The protocol attributes (session.id, agent.role, o2r.*) work everywhere; backend-specific UX (annotation configs in Phoenix, dashboards in Grafana) is added by extension packages.
Agents route through coily; see .coily/coily.yaml. Humans: make help for the full target list.
This is the simplest shape the relay supports: one client, one relay, one peer, one trace. Real flows are more interesting. The LUCA-flow demo below runs eight workers, an orchestrator, a planner, a validator, and a deployer through this same relay, with star-topology enforcement, retries, a deliberate worker crash, and a rogue worker that gets gated by the relay.
The relay's peer registry comes from OTEL_A2A_RELAY_PEERS=A=http://...,B=http://.... The Makefile sets this for you. If a target in metadata.agent.target has no peer registered, the relay synthesizes a completed Task and skips the forward.
Diagram source: scripts/render_topology.py. Regenerate with uv run --with matplotlib python scripts/render_topology.py.
assets/topology.png (above) is the protocol-shape illustration, a fixed cartoon. assets/session-topology.gif (the hero at the top) is the temporal one: real OTel spans for one session, animated by start time, against the same star.
make phoenix-fg # operator-owned, in another terminal
make demo # produces a `demo` session
OUT=mine.gif make gif CTX=demo # writes mine.gif from real Phoenix spansThe renderer pulls every span tagged with session.id == $CTX from Phoenix's GraphQL endpoint, reduces them into hops (parent -> agent), auto-detects the relay as the hub, sorts the leaves alphabetically for a stable color palette, and animates each hop in start-time order. Two hops in the same tick render with their arcs bowed in opposite directions, so a forward-and-return pair reads as crossings rather than as a single overdrawn line.
Determinism is baked in: same session.id against the same Phoenix DB produces a byte-identical GIF. Tests assert this against a synthetic-span fixture in arize_phoenix/tests/fixtures/sessions.py, so a renderer regression fails CI before the README hero drifts. The renderer is Pillow-only (no matplotlib); freetype ships with Pillow, JetBrains Mono ships in arize_phoenix/src/otel_a2a_relay_arize_phoenix/viz/assets/, the GIF palette is built once and reused across frames. To intentionally regenerate the README hero after a renderer change, run python -m tests.fixtures.regen_session_gifs from arize_phoenix/ and commit the new bytes.
The viz extra is opt-in:
uv sync --extra vizmake gif does this automatically. The base relay install stays Pillow-free.
message/send- send a message, get a Task back. The originator setsmetadata.agent.id(sender) and optionallymetadata.agent.target(recipient).message/stream- same envelope asmessage/send, but the response istext/event-streamcarrying A2Astatus-updateandartifact-updateevents. The relay forwards the SSE through and emits onea2a.message.stream_chunkspan event per artifact.tasks/get- retrieve a Task by id from the relay's in-memory store. Each peer agent indexes its own tasks too.tasks/cancel- mark a Task as canceled and emit ana2a.task.cancelspan.
The peer agent serves an A2A AgentCard at /.well-known/agent.json (capabilities, skills, protocol version). The relay's GET /peers aggregates them for discovery.
Every a2a.task carries session.id, a2a.task.id, agent.id, graph.node.id, graph.node.parent_id, openinference.span.kind=AGENT, plus input.value / output.value (OpenInference) and a2a.message.text / a2a.message.reply_text shortcuts. State changes are span events (a2a.task.state_change with from / to). Stream chunks are span events (a2a.message.stream_chunk with seq / final).
The original v0.1 protocol document at docs/protocol.md is the precedent and explains why agent identity rides on attributes (Phoenix drops Resource attributes), why the Agent Graph uses graph.node.* (Phoenix doesn't expose span links), and why state changes are events not spans (tree noise vs queryable timeline).
examples/luca-flow/ is a real multi-agent choreography that dogfoods the relay end-to-end. Eight worker subprocesses + an orchestrator + a planner + a validator + a deployer build the AURORA microsite (a fictional consumer desk lamp marketed as if it physically channels solar-wind charged particles) from real public-domain NASA imagery committed to the repo. Star topology is enforced by the relay; one worker deliberately crashes, another deliberately tries to bypass the orchestrator and gets a -32010 from the relay's gate.
The demo only depends on otel-a2a-relay-core. Pick whichever backend you want to send the spans to:
# Phoenix
make phoenix-fg
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:6006 make luca-demo
# Tempo + Grafana
make tempo-up
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 make luca-demoThe same flow runs in CI on every push (.github/workflows/luca-demo.yml), with Phoenix in CI as a background process. The built dist/ is uploaded as a workflow artifact. See examples/luca-flow/README.md for the choreography and validation rules.
Operator CLI: coily channel once that side catches up. Origin discussion: coilysiren/coilyco-ai#24.
- AGENTS.md - agent-facing operating rules.
- docs/FEATURES.md - inventory of what ships today.
- .coily/coily.yaml - allowlisted commands. Agents route through coily, not bare
make/uv/python.
Cross-reference convention from coilysiren/coilyco-ai#313. This repo is the worked example.
MIT.

