#260 Configurable concurrency limits for workflow fan-out (forEach, parallel jobs/steps)
Opened by john · 5/6/2026· Shipped 5/6/2026
Problem statement
Swamp's workflow engine has no concurrency controls anywhere in the parallelism path. Independent jobs at the same topological level, parallel steps within a job, and forEach expansions are all funnelled through merge() (src/infrastructure/stream/merge.ts) without any semaphore, pool, or cap. The design doc explicitly calls this out as a goal — "maximum parallelism through the job" (design/workflow.md:14) — and the only hardcoded bound I could find anywhere in the engine is MAX_WORKFLOW_NESTING_DEPTH = 10 in src/.../execution_service.ts, which is about nesting depth, not width.
In practice, unbounded fan-out runs into two real ceilings that swamp itself is in the best position to enforce:
Downstream API rate limits. A
forEachover N items where each step calls an external API (LLM provider, GitHub, Slack, AWS, any rate-limited SaaS) sends N concurrent requests the instant the level becomes runnable. For a 100-item fan-out against an API with a 50 RPM concurrent-request cap, half the runs immediately fail with 429s. This isn't a programming error — it's the natural behaviour of a workflow engine that has no notion of "how many of these am I allowed to run at once."Local execution capacity. Under the docker driver, every parallel step spawns its own container — there is no cap in
DockerExecutionDriver. A 200-item fan-out tries to spin up 200 simultaneous containers; the docker daemon, host kernel (file descriptors, PID limits), and host memory will fail well before swamp does. Under the raw driver, the same applies to in-process resource pressure.
Today every user solves this themselves, badly:
- Outer shell loops that batch invocations of the workflow (defeats the point of
forEach) - In-model
sleep(index * delay)staggering (crude, brittle, model-author burden) - Host-mounted
flocklockfiles in the docker image (works, completely outside swamp's awareness)
These all push concurrency policy out of the workflow definition and into ad-hoc user code, where it's reinvented per workflow with different tradeoffs.
Proposed solution
Introduce a concurrency: field on the workflow schema, configurable at three levels with the existing "first non-undefined wins" resolution (matching how driver: already works):
# workflow level — applies to the whole DAG
concurrency: 10
jobs:
- name: fan-out
# job level — applies to all steps in this job
concurrency: 5
steps:
- name: per-item
forEach:
item: target
in: \${{ inputs.targets }}
# forEach level — caps the expansion specifically
concurrency: 3
task: { ... }Semantics:
- A non-negative integer is a hard cap on simultaneously executing units at that level (jobs / steps / forEach iterations).
0or unset = current unbounded behaviour (so it's a pure additive change, no migration).- A semaphore acquired before each
merge()task starts and released on completion handles this cleanly without restructuring the topological executor. - Resolution order: forEach > step > job > workflow > unbounded. The most-local cap wins.
A complementary global setting would also be useful for daemon-wide protection of the host:
SWAMP_MAX_CONCURRENT_STEPS=20 swamp serveThis is a defence-in-depth setting for shared-host scenarios (CI runners, dev laptops) where individual workflow authors may not know the host's real ceiling.
Alternatives considered
- Document the absence and tell users to batch externally. Status quo. Fine for trivial fan-outs; falls apart the moment a workflow does anything against a real API. Pushes a load-shedding concern from the engine into every workflow author's brain.
- Per-driver concurrency caps only (e.g., on
docker:). Helps with container exhaustion but doesn't help with the more common API-rate-limit case, since you can absolutely overwhelm an HTTP API with therawdriver. - Dynamic backpressure based on observed failure rates. Interesting but a much bigger feature; explicit caps are the 80% case and a prerequisite for any adaptive scheme later.
Why this belongs in swamp
The workflow engine is the only layer that knows the topology — which steps will run in parallel, which are blocked behind dependencies, when a level becomes runnable. Asking each model author to invent their own concurrency control means they don't have that view; they only see one invocation at a time. Putting the cap in the workflow schema keeps the policy where the topology is.
This is also the cleanest path to making forEach actually usable for real fan-out workloads. Right now, anyone running forEach over more than a handful of items has to either accept downstream failures or implement workarounds that defeat the declarative spirit of swamp.
Shipped
Click a lifecycle step above to view its details.
Sign in to post a ripple.