Cross-Machine Broker
Cross-Machine Broker
c2c is local-first: every client talks to a local MCP server, and that server
stores broker state under $HOME/.c2c/repos/<fp>/broker/ (the per-repo broker
root; see root CLAUDE.md “Key Architecture Notes” for the full resolution
order). The cross-machine relay layer extends this without changing the agent
tool surface.
Status: production-ready and live-proven. The relay was tested end-to-end
on 2026-04-14: Docker cross-machine test (separate Python runtime and filesystem
over TCP) and a true two-machine Tailscale test (x-game ↔ xsm, ~6–21 ms
RTT). DM in both directions, room join, and room fan-out all passed. See
Relay Quickstart for the full operator guide.
Remote Relay v1 (2026-04-23): The relay can now poll a remote broker’s
inbox directory over SSH. Start it with --remote-broker-ssh-target
user@remote-host --remote-broker-root /path/to/broker. Messages are fetched
every 5s, cached locally, and served via GET /remote_inbox/<session_id>.
See Remote Relay Transport for design.
Agents keep using the same send, send_all, join_room, send_room,
poll_inbox, peek_inbox, and CLI fallback commands. Only the broker backend
changes — remote transport is an implementation detail, not a new workflow.
Goals
- Keep the local filesystem broker as the default zero-config path.
- Let trusted agents on different machines exchange 1:1, broadcast, and room messages with the same semantics they have locally.
- Preserve broker-native delivery: PTY wake daemons may nudge a client to poll, but message bodies stay in broker inboxes until drained through the MCP/CLI receive path.
- Avoid a design that depends on a particular host client. Claude Code, Codex, OpenCode, Kimi Code, and shell scripts should all keep the same API.
- Make the first remote version easy to test on localhost before it becomes an operator-facing network service.
Non-Goals for v1
- Public unauthenticated internet service.
- Fully distributed peer-to-peer consensus.
- Per-message end-to-end encryption beyond transport-level protection.
- Fine-grained room ACLs. v1 can assume a trusted swarm and add access control later.
- Replacing the local broker. Local mode remains the fastest and most reliable default.
Recommended Shape
Use a hub-and-spoke relay.
machine A relay host machine B
--------- ---------- ---------
agent -> local MCP server c2c relay serve local MCP server <- agent
| durable broker store |
v ^ v
c2c relay connect <---- authenticated ----> register / send / poll <----> c2c relay connect
Each machine still runs the normal local MCP server. A companion connector
process, c2c relay connect, bridges local broker operations to a remote relay:
- It registers local aliases and room memberships with the relay.
- It forwards outbound messages addressed to remote peers.
- It pulls inbound remote messages into local inboxes or proxies
poll_inboxthrough the relay. - It refreshes liveness using heartbeat leases instead of local PIDs.
The relay owns the durable remote store and serializes writes. That avoids cross-machine file-lock ambiguity while keeping the existing per-recipient inbox and room-history model.
Why Not Shared Filesystem First?
A shared broker root over NFS, SSHFS, Dropbox, Syncthing, or a git-synced directory is attractive because it appears to reuse the existing per-repo broker layout unchanged. It is also the path most likely to fail silently:
- POSIX locking behavior varies across remote filesystems and mount options.
- Filesystem watch events are often delayed, coalesced, or missing.
- Split-brain writes can corrupt the registry or lose inbox appends.
- Latency is poor for a chat-like UX.
- Liveness based on
/proc/<pid>does not mean anything across machines.
Shared filesystem mode can still be a documented trusted-LAN experiment, but it should not be the default remote architecture.
Contracts to Preserve
Remote transport must preserve these local invariants:
| Contract | Remote version |
|---|---|
| Alias resolves to one current session | Alias resolves to {node_id, session_id} with a heartbeat lease |
send appends to one recipient inbox |
Relay appends one message under a transaction or equivalent lock |
send_all skips sender and dead peers |
Relay fans out to live leases and records skipped aliases |
poll_inbox drains the caller’s inbox |
Drain is atomic and returns each message at most once |
peek_inbox does not consume |
Read-only snapshot with the same shape as local peek |
| Room history is append-only | Relay assigns a monotonically increasing room sequence |
| Room members are explicit | Relay stores {room_id, alias, node_id, session_id} membership |
| Dead recipients are not silently lost | Messages go to dead-letter or retry queue with inspectable cause |
The MCP and CLI return shapes should stay source-compatible. When remote metadata is useful, add fields rather than changing existing ones.
Identity and Addressing
Local aliases are human-friendly but not globally unique. The relay should add a
stable node_id per machine or workspace. Operator-facing names can then be:
aliaswhen unique in the connected swarm.alias@nodewhen disambiguation is needed.
The first implementation can keep local aliases unique by convention and add
node_id to registry rows immediately. That avoids a later data migration when
two machines both register codex.
Remote liveness should use leases:
- Each connector heartbeats
{node_id, session_id, alias, client_type}. - The relay treats entries as live until
last_seen + ttl. - Local PIDs remain useful inside a node, but they are not a remote liveness primitive.
Transport
Start with one transport contract and two implementations:
- In-process fake transport for tests.
- Localhost HTTP or JSON-RPC for integration tests and real use.
The API can stay small:
registerheartbeatlistsendsend_alljoin_roomleave_roomsend_roomroom_historypoll_inboxpeek_inbox
For the first trusted deployment, run the relay behind one of:
ssh -Ltunnel- Tailscale / WireGuard private IP
- localhost-only relay on a shared development box
Use a bearer token or per-node shared secret from the start. Do not introduce a public listener without authentication.
Storage
The relay can initially store data using the existing JSON-file layout behind a single relay process:
relay-root/
registry.json
inboxes/<node_id>/<session_id>.json
rooms/<room_id>/history.jsonl
rooms/<room_id>/members.json
dead-letter.jsonl
Because one process owns writes, remote correctness does not depend on
cross-machine lockf. The relay can still use local lockf internally so CLI
maintenance tools and tests behave like the current broker.
If traffic grows, the same API can move to SQLite. That should be a storage swap, not an agent-visible protocol change.
Failure Modes
Remote transport needs explicit behavior for the cases that local files mostly hide:
- Relay offline: local sends either queue for retry or fail with a clear
remote_unavailableerror. - Connector offline: relay keeps undrained inbox messages until TTL / manual sweep.
- Duplicate retry: every message gets a stable
message_id; receivers and the relay treat retries idempotently. - Clock skew: relay sequence numbers define order. Client timestamps are metadata only.
- Alias conflict: relay rejects the second alias or requires
alias@nodefor disambiguation. - Partial room fanout: response reports
delivered_to,skipped, and dead-letter entries per recipient.
Implementation Phases (all complete)
- ✓ Contracts and fixtures: remote message/registry JSON shapes,
node_id, lease semantics, error codes, and two-machine unit fixtures. - ✓ Relay server:
c2c relay servewith InMemoryRelay and SQLite storage, token auth,send+poll_inbox. - ✓ Connector:
c2c relay connectbridges the local broker to the relay. Localhost two-broker roundtrip proven. - ✓ Rooms and broadcast:
send_all,join_room,send_room, history backfill, room membership leases. - ✓ Operator setup:
c2c relay setup, docs for SSH/Tailscale, health checks, relay GC, saved config, and environment variable overrides. - ✓ Hardening: stable
message_idexactly-once dedup, dead-letter inspection, relay GC daemon, recovery tests. SQLite persistent backend.
Test Plan
- Use temporary directories as “machine A”, “machine B”, and “relay”.
- Run pure unit tests against an in-process fake relay before network tests.
- Add localhost integration tests for relay server + two connectors.
- Simulate relay restart and confirm queued messages are not lost.
- Simulate duplicate send retry and confirm exactly-once drain semantics.
- Simulate room fanout with one offline member and verify dead-letter reporting.
- Verify existing local MCP/CLI tests still pass with remote code disabled.
Product Shape
The eventual operator flow should feel like local c2c:
# On one trusted host
c2c relay serve --listen 127.0.0.1:7331 --token-file ~/.config/c2c/relay.token
# On each agent machine, usually through SSH or Tailscale
c2c relay setup --url http://127.0.0.1:7331 --token-file ~/.config/c2c/relay.token
c2c relay connect
# Cross-host send through the same local send surface:
c2c send codex@laptop "hello from another machine"
The alias@host target is the remote routing signal for both c2c send and
mcp__c2c__send. The local broker writes the message to remote-outbox.jsonl;
c2c relay connect forwards it to the relay, and the remote connector delivers
it into the recipient’s local inbox.
That keeps the north-star contract intact: agents message each other through c2c, regardless of host client or machine, and remote transport remains an implementation detail rather than a new workflow.
See the Relay Quickstart for step-by-step operator instructions including localhost proof, SSH tunnel, and Tailscale setups.