Skip to main content
← Back to list
01Issue
FeatureShippedSwamp CLI
Assigneesstack72

#260 Configurable concurrency limits for workflow fan-out (forEach, parallel jobs/steps)

Opened by john · 5/6/2026· Shipped 5/6/2026

Problem statement

Swamp's workflow engine has no concurrency controls anywhere in the parallelism path. Independent jobs at the same topological level, parallel steps within a job, and forEach expansions are all funnelled through merge() (src/infrastructure/stream/merge.ts) without any semaphore, pool, or cap. The design doc explicitly calls this out as a goal — "maximum parallelism through the job" (design/workflow.md:14) — and the only hardcoded bound I could find anywhere in the engine is MAX_WORKFLOW_NESTING_DEPTH = 10 in src/.../execution_service.ts, which is about nesting depth, not width.

In practice, unbounded fan-out runs into two real ceilings that swamp itself is in the best position to enforce:

  1. Downstream API rate limits. A forEach over N items where each step calls an external API (LLM provider, GitHub, Slack, AWS, any rate-limited SaaS) sends N concurrent requests the instant the level becomes runnable. For a 100-item fan-out against an API with a 50 RPM concurrent-request cap, half the runs immediately fail with 429s. This isn't a programming error — it's the natural behaviour of a workflow engine that has no notion of "how many of these am I allowed to run at once."

  2. Local execution capacity. Under the docker driver, every parallel step spawns its own container — there is no cap in DockerExecutionDriver. A 200-item fan-out tries to spin up 200 simultaneous containers; the docker daemon, host kernel (file descriptors, PID limits), and host memory will fail well before swamp does. Under the raw driver, the same applies to in-process resource pressure.

Today every user solves this themselves, badly:

  • Outer shell loops that batch invocations of the workflow (defeats the point of forEach)
  • In-model sleep(index * delay) staggering (crude, brittle, model-author burden)
  • Host-mounted flock lockfiles in the docker image (works, completely outside swamp's awareness)

These all push concurrency policy out of the workflow definition and into ad-hoc user code, where it's reinvented per workflow with different tradeoffs.

Proposed solution

Introduce a concurrency: field on the workflow schema, configurable at three levels with the existing "first non-undefined wins" resolution (matching how driver: already works):

# workflow level — applies to the whole DAG
concurrency: 10

jobs:
  - name: fan-out
    # job level — applies to all steps in this job
    concurrency: 5

    steps:
      - name: per-item
        forEach:
          item: target
          in: \${{ inputs.targets }}
        # forEach level — caps the expansion specifically
        concurrency: 3
        task: { ... }

Semantics:

  • A non-negative integer is a hard cap on simultaneously executing units at that level (jobs / steps / forEach iterations).
  • 0 or unset = current unbounded behaviour (so it's a pure additive change, no migration).
  • A semaphore acquired before each merge() task starts and released on completion handles this cleanly without restructuring the topological executor.
  • Resolution order: forEach > step > job > workflow > unbounded. The most-local cap wins.

A complementary global setting would also be useful for daemon-wide protection of the host:

SWAMP_MAX_CONCURRENT_STEPS=20 swamp serve

This is a defence-in-depth setting for shared-host scenarios (CI runners, dev laptops) where individual workflow authors may not know the host's real ceiling.

Alternatives considered

  • Document the absence and tell users to batch externally. Status quo. Fine for trivial fan-outs; falls apart the moment a workflow does anything against a real API. Pushes a load-shedding concern from the engine into every workflow author's brain.
  • Per-driver concurrency caps only (e.g., on docker:). Helps with container exhaustion but doesn't help with the more common API-rate-limit case, since you can absolutely overwhelm an HTTP API with the raw driver.
  • Dynamic backpressure based on observed failure rates. Interesting but a much bigger feature; explicit caps are the 80% case and a prerequisite for any adaptive scheme later.

Why this belongs in swamp

The workflow engine is the only layer that knows the topology — which steps will run in parallel, which are blocked behind dependencies, when a level becomes runnable. Asking each model author to invent their own concurrency control means they don't have that view; they only see one invocation at a time. Putting the cap in the workflow schema keeps the policy where the topology is.

This is also the cleanest path to making forEach actually usable for real fan-out workloads. Right now, anyone running forEach over more than a handful of items has to either accept downstream failures or implement workarounds that defeat the declarative spirit of swamp.

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 1 MOREASSIGNED+ 2 MOREREVIEW+ 3 MOREPR_MERGEDSHIPPED

Shipped

5/6/2026, 6:19:59 PM

Click a lifecycle step above to view its details.

03Sludge Pulse
stack72 assigned stack725/6/2026, 4:30:31 PM

Sign in to post a ripple.