Skip to main content
← Back to list
01Issue
FeatureOpenSwamp CLI
AssigneesNone

Relationships

#535 Remote execution: orchestrator/worker fan-out (replaces execution drivers)

Opened by adam · 6/3/2026

Problem

swamp execution is single-host and in-process. There is no way to fan a workflow or method run out across many machines, and the existing execution-driver abstraction (raw / docker) only chooses how a method runs locally — not which machine it runs on. We want first-class remote execution: a central orchestrator distributing work across a fleet of disposable workers.

Proposed solution

A simple, brutalist design — bidirectional libswamp over a worker-initiated connection — that replaces execution drivers entirely.

Core shape

  • A single orchestrator owns the durable world: DAG/run state, datastore, vaults, catalog, definitions, extension bundles, locks, scheduler, tokens, audit.
  • Workers are disposable swamp processes that carry no repository, datastore, vault, or extension state — just a binary, a token, and a URL. They dial home (outbound, NAT-friendly), enroll, and run whatever is dispatched.
  • Every side-effecting capability a running method touches (read/write data, resolve a secret, load a definition) is proxied back to the orchestrator, which is the single durable authority. This drops cleanly onto the existing MethodContext / libswamp *Deps injection seam: remote execution swaps the leaves of the dependency tree for proxy adapters. Method and operation code are unchanged.

Drivers removed. No ExecutionDriver, no raw/docker/custom selection, no driver: fields. Execution is two paths: in-process on the orchestrator's loopback executor (single-host, unchanged) or dispatch to a worker (also in-process). Isolation/environment become a deployment property of the worker (run a containerized worker; label it) rather than per-step config.

Two transports (both worker-initiated):

  • Control plane — WebSocket: enrollment, dispatch, cancel, run events, and the metadata capability verbs. A symmetric two-handler-registry protocol built by splitting src/serve/'s server-role from its executor-role.
  • Data plane — HTTP/2: byte-heavy ops only (GET /data/{dataId}/{version}, PUT /data/..., GET /bundle/{fingerprint}). HTTP/2 gives multiplexing + flow control natively, so no hand-rolled chunking/framing. (Single-connection ws-over-h2 per RFC 8441 is the ideal but Deno doesn't support it yet.)

Enrollment & tokens. A named, time-boxed enrollment token enrolls exactly one worker once (bound to a per-instance UUID), then reconnects that same instance for its lifetime to survive socket blips. Enrollment issues a short-lived bearer session credential (sliding-window refresh) for the data plane. swamp worker token create/list/revoke + swamp worker connect.

Worker state is swamp data. The pool, token lifecycle, and step leases are persisted by first-class built-in models (worker, enrollment-token, step-lease) through the normal datastore/catalog — so provisioning/autoscaling are just workflows that data.query the pool, lifecycle history is free, and reports/CLI see worker state with no bespoke surface.

Scheduling. Orchestrator owns the DAG and dispatches ready steps; matches on direct target (worker name/uuid) → labels → platform, least-loaded tiebreak. Compute and state location are fully decoupled.

Host launching is a swamp workflow. A built-in mint model writes the token to a vault and returns a reference; user-authored launch models (k8s/cloud) read it via ${{ vault.get(...) }} and boot a worker. Bootstrapped on the loopback executor, so no chicken-and-egg.

Data semantics (verified against current code). Writes are immediately durable (no staging); a PUT completes only once repo.save() persists. Write-scoping is the existing declared-spec enforcement (no new authz). Data is versioned-immutable, so workers cache artifact bytes by (dataId, version) and latest/queries stay live.

Failure semantics. Control socket = liveness. A drop opens a reconnection grace window so reconnect and re-dispatch never double-execute. Reads resume; an in-flight write is the ambiguity case → fail the run. No transparent retry of write-bearing steps in v1 (needs write idempotency first).

Scope / non-goals

In scope: dial-home enrollment, drivers removed, extension code shipped, remote MethodContext, ws control + h2 data plane, worker state as swamp data, label + direct scheduling, reconnection grace window, host-launch-as-workflow.

Non-goals (v1): remote datastore config for workers (one proxied plane only); data-locality scheduler affinity; shipping cloud/k8s launch integrations.

Full design

The complete, converged design — ubiquitous language, protocol, capability inventory, reuse-vs-new map, known limits — lives in the repo at design/remote-execution.md. This issue tracks building it.

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED

Open

6/3/2026, 12:40:40 AM

No activity in this phase yet.

03Sludge Pulse

Sign in to post a ripple.