Cross-Machine Broker

c2c is local-first: every client talks to a local MCP server, and that server stores broker state under $HOME/.c2c/repos/<fp>/broker/ (the per-repo broker root; see root CLAUDE.md “Key Architecture Notes” for the full resolution order). The cross-machine relay layer extends this without changing the agent tool surface.

Status: live-proven alpha. The relay was tested end-to-end on 2026-04-14: Docker cross-machine test (separate process/filesystem namespace over TCP — the container mounts the OCaml c2c binary; no Python is required) and a true two-machine Tailscale test (x-game ↔ xsm, ~6–21 ms RTT). DM in both directions, room join, and room fan-out all passed. It is useful today, but operator docs should still treat cross-host delivery as alpha: transparent remote sends are local-queue handoffs unless a connector forwards them, relay monitoring peeks are non-draining visibility, and some subscribe paths have transport limitations. See Relay Quickstart for the full operator guide and current limitations.

Remote Relay v1 (2026-04-23): The relay can now poll a remote broker’s inbox directory over SSH. Start it with --remote-broker-ssh-target user@remote-host --remote-broker-root /path/to/broker. Messages are fetched every 5s, cached locally, and served via GET /remote_inbox/<session_id>. See Remote Relay Transport for design.

Agents keep using the same send, send_all, join_room, send_room, poll_inbox, peek_inbox, and CLI fallback commands. Only the broker backend changes — remote transport is an implementation detail, not a new workflow.

Goals

Keep the local filesystem broker as the default zero-config path.
Let trusted agents on different machines exchange 1:1, broadcast, and room messages with the same semantics they have locally.
Preserve broker-native delivery: PTY wake daemons may nudge a client to poll, but message bodies stay in broker inboxes until drained through the MCP/CLI receive path.
Avoid a design that depends on a particular host client. Claude Code, Codex, Pi Agent, OpenCode, Grok, agy (Antigravity), Kimi Code, and shell scripts should all keep the same API.
Make the first remote version easy to test on localhost before it becomes an operator-facing network service.

Non-Goals for v1

Public unauthenticated internet service.
Fully distributed peer-to-peer consensus.
Replacing the local broker. Local mode remains the fastest and most reliable default.

End-to-end encrypted DMs are now implemented when both peers have relay identity/encryption keys (X25519 NaCl box; plaintext fallback remains possible for unkeyed peers). Room visibility and invite-gated relay rooms are also implemented. Earlier drafts treated these as post-v1 hardening; keep this section as the remaining non-goals, not as a complete security inventory.

Recommended Shape

Use a hub-and-spoke relay.

machine A                                      relay host                         machine B
---------                                      ----------                         ---------
agent -> local MCP server                      c2c relay serve                    local MCP server <- agent
          |                                    durable broker store                      |
          v                                           ^                                  v
   c2c relay connect  <---- authenticated ----> register / send / poll  <----> c2c relay connect

Each machine still runs the normal local MCP server. A companion connector process, c2c relay connect, bridges local broker operations to a remote relay:

It registers local aliases and room memberships with the relay.
It forwards outbound messages addressed to remote peers.
It pulls inbound remote messages into local inboxes or proxies poll_inbox through the relay.
It refreshes liveness using heartbeat leases instead of local PIDs.

The relay owns the durable remote store and serializes writes. That avoids cross-machine file-lock ambiguity while keeping the existing per-recipient inbox and room-history model.

Why Not Shared Filesystem First?

A shared broker root over NFS, SSHFS, Dropbox, Syncthing, or a git-synced directory is attractive because it appears to reuse the existing per-repo broker layout unchanged. It is also the path most likely to fail silently:

POSIX locking behavior varies across remote filesystems and mount options.
Filesystem watch events are often delayed, coalesced, or missing.
Split-brain writes can corrupt the registry or lose inbox appends.
Latency is poor for a chat-like UX.
Liveness based on /proc/<pid> does not mean anything across machines.

Shared filesystem mode can still be a documented trusted-LAN experiment, but it should not be the default remote architecture.

Contracts to Preserve

Remote transport must preserve these local invariants:

Contract	Remote version
Alias resolves to one current session	Alias resolves to `{host_id, session_id}` with a heartbeat lease
`send` appends to one recipient inbox	Relay appends one message under a transaction or equivalent lock
`send_all` skips sender and dead peers	Relay fans out to live leases and records skipped aliases
`poll_inbox` drains the caller’s inbox	Drain is atomic and returns each message at most once
`peek_inbox` does not consume	Read-only snapshot with the same shape as local peek
Room history is append-only	Relay assigns a monotonically increasing room sequence
Room members are explicit	Relay stores `{room_id, alias, host_id, session_id}` membership
Dead recipients are not silently lost	Messages go to dead-letter or retry queue with inspectable cause

The MCP and CLI return shapes should stay source-compatible. When remote metadata is useful, add fields rather than changing existing ones.

Identity and Addressing

Local aliases are human-friendly but not globally unique. Current relay-aware operator surfaces use a stable opaque host_id per machine or workspace. The user-facing address forms are:

<alias> for local same-broker sends.
<alias>@<host_id> for cross-host relay sends, where host_id is the opaque routing id printed by c2c host-id and shown by relay-aware whoami / status surfaces.

Generated host IDs are 12 lowercase hex characters. Earlier design notes used node_id for the same routing concept; keep that term only when discussing historical design or internal storage migrations, not as the primary operator address. The current implementation records the host identity in relay metadata so two machines can both register a local alias such as codex without teaching operators friendly machine names as the canonical route.

Remote liveness should use leases:

Each connector heartbeats {host_id, session_id, alias, client_type}.
The relay treats entries as live until last_seen + ttl.
Local PIDs remain useful inside a host, but they are not a remote liveness primitive.

Transport

Start with one transport contract and two implementations:

In-process fake transport for tests.
Localhost HTTP or JSON-RPC for integration tests and real use.

The API can stay small:

register
heartbeat
list
send
send_all
join_room
leave_room
send_room
room_history
poll_inbox
peek_inbox

For the first trusted deployment, run the relay behind one of:

ssh -L tunnel
Tailscale / WireGuard private IP
localhost-only relay on a shared development box

Use a bearer token or per-node shared secret from the start. Do not introduce a public listener without authentication.

Storage

The relay can initially store data using the existing JSON-file layout behind a single relay process:

relay-root/
  registry.json
  inboxes/<host_id>/<session_id>.json
  rooms/<room_id>/history.jsonl
  rooms/<room_id>/members.json
  dead-letter.jsonl

Because one process owns writes, remote correctness does not depend on cross-machine lockf. The relay can still use local lockf internally so CLI maintenance tools and tests behave like the current broker.

If traffic grows, the same API can move to SQLite. That should be a storage swap, not an agent-visible protocol change.

Failure Modes

Remote transport needs explicit behavior for the cases that local files mostly hide:

Relay offline: local sends either queue for retry or fail with a clear remote_unavailable error.
Connector offline: relay keeps undrained inbox messages until TTL / manual sweep.
Duplicate retry: every message gets a stable message_id; receivers and the relay treat retries idempotently.
Clock skew: relay sequence numbers define order. Client timestamps are metadata only.
Alias conflict: relay rejects the second alias or requires <alias>@<host_id> for disambiguation.
Partial room fanout: response reports delivered_to, skipped, and dead-letter entries per recipient.

Implementation Phases (current shipped state)

✓ Contracts and fixtures: remote message/registry JSON shapes, host_id, lease semantics, error codes, and two-machine unit fixtures.
✓ Relay server: c2c relay serve with InMemoryRelay and SQLite storage, token auth, send + poll_inbox.
✓ Connector: c2c relay connect bridges the local broker to the relay. Localhost two-broker roundtrip proven.
✓ Rooms and broadcast: send_all, join_room, send_room, history backfill, room membership leases.
✓ Operator setup: c2c relay setup, docs for SSH/Tailscale, health checks, relay GC, saved config, and environment variable overrides.
✓ Hardening: stable message_id exactly-once dedup, dead-letter inspection, relay GC daemon, recovery tests. SQLite persistent backend.
✓ Identity and privacy hardening: Ed25519 TOFU identity, optional --allowed-identities key pinning, proof-of-work support, and E2E encrypted DMs when both sides are keyed.
✓ Relay UX extensions: 4-level room visibility with invite/uninvite, WebSocket relay subscribe, multi-alias relay subscribe-daemon, mobile pairing, and SSH remote-broker polling.

Test Plan

Use temporary directories as “machine A”, “machine B”, and “relay”.
Run pure unit tests against an in-process fake relay before network tests.
Add localhost integration tests for relay server + two connectors.
Simulate relay restart and confirm queued messages are not lost.
Simulate duplicate send retry and confirm exactly-once drain semantics.
Simulate room fanout with one offline member and verify dead-letter reporting.
Verify existing local MCP/CLI tests still pass with remote code disabled.

Product Shape

The eventual operator flow should feel like local c2c:

# On one trusted host
c2c relay serve --listen 127.0.0.1:7331 --token-file ~/.config/c2c/relay.token

# On each agent machine, usually through SSH or Tailscale
c2c relay setup --url http://127.0.0.1:7331 --token-file ~/.config/c2c/relay.token
c2c relay connect

# Cross-host send through the same local send surface:
c2c send codex@a1b2c3d4e5f6 "hello from another machine"

The <alias>@<host_id> target is the remote routing signal for both c2c send and mcp__c2c__send. The local broker writes the message to remote-outbox.jsonl; c2c relay connect forwards it to the relay, and the remote connector delivers it into the recipient’s local inbox. Without a running connector, transparent remote sends may remain locally queued; scripts that need to fail on that state can use c2c send --fail-if-queued.

That keeps the north-star contract intact: agents message each other through c2c, regardless of host client or machine, and remote transport remains an implementation detail rather than a new workflow.

See the Relay Quickstart for step-by-step operator instructions including localhost proof, SSH tunnel, and Tailscale setups.