paths.base: manifest is not honored for workflows: — bundled workflows only resolve from repo root, blocking self-contained subdir layouts (sibling to #459)

Under concurrent access (25 parallel subagents hitting the same model instance), a single long-running model method or workflow step holds the per-model datastore lock for >60s, causing all other concurrent callers to receive LockTimeoutError. This cascades — once one process times out, the next queued caller also hits the 60s limit, creating a chain of failures.

Environment

swamp version: 20260530.005533.0-sha.1c117111
OS: Linux (WSL2)
Single model instance of type @webframp/gitlab with ~45 extension methods
Local filesystem datastore (no Postgres/S3)

Steps to Reproduce

Create a model instance and register ~45 extension methods on it.
Spawn 10+ concurrent swamp model method run <model> <method> calls targeting the same model with varying methods (some fast at ~1s, some slow at ~3.5s).
Meanwhile, spawn 3+ concurrent swamp workflow run calls that also target the same model across multiple steps.
Observe lock contention within 1-2 minutes.

Observed Behavior

WRN datastore·lock Waiting for lock "data/@webframp/gitlab/.../.lock"
  held by "user@host" (pid XXXXX, acquired XXXms ago)

FTL error LockTimeoutError: Lock "data/@webframp/gitlab/.../.lock"
  held by user@host (pid XXXXX) — timed out after 60130ms

A single workflow process (pid 4130092 in our test) held the lock across multiple minutes (several consecutive lock acquisitions), causing every other concurrent process to time out at 60s. The lock holder would release after ~3 rounds of lock acquisition (~3 minutes), at which point another queued process would take over and repeat the pattern.

Workflow runs wrapped this as:

FTL error Error: 'Workflow execution failed: Lock ".../.lock" held by user@host (pid XXXXX) — timed out after 60135ms'

Impact

~80% of concurrent model method calls failed during contention periods
Entire workflow runs failed because any single step couldn't acquire the lock
No automatic recovery — the contention only resolved when the lock-holding process naturally completed its cycle

Expected Behavior

Lock acquisition should have a fair queue (FIFO) rather than everyone contending and timing out
Workflow steps should share or release the lock between steps, not hold it across the entire workflow job
A configurable timeout or retry mechanism would help
Consider a read/write lock pattern for read-only methods

02Bog Flow

Triaged

6/1/2026, 9:35:00 PM

Click a lifecycle step above to view its details.

03Sludge Pulse

stack72 assigned stack726/1/2026, 9:33:02 PM

webframp commented 6/1/2026, 6:44:42 PM

A second stress test (1 hour, 15 parallel subagents) confirms this is severe and reproducible.

Results

Pattern	Subagents	Lock Waits	Lock Timeouts (60s)
Rapid dashboard (1-4s sleep between calls)	3	~90 each	~40 each
Mixed methods (get_current_user, list_all_projects, list_todos, 2-6s sleep)	3	69 each	19-32 each
daily-summary workflow only	3	~63 each	22-33 each
health-check + mr-triage workflows	2	~59 each	~32 each
Zero-sleep rapid fire (get_current_user, list_runners, list_todos in tight loop)	1	110	45

All 9 model-method subagents hit lock waits and timeouts within the first few minutes. The zero-sleep agent was worst at 110 waits and 45 timeouts in 1 hour.

Key observation

Data operations (swamp data list, data query, data search) ran concurrently with zero lock contention — they use a different lock resource. This suggests the bottleneck is specifically the per-model datastore lock, not a global lock.

Impact quantification

~30-40% of rapid-fire model method calls failed with LockTimeoutError during contention periods
7-16 workflow runs failed entirely per subagent because individual steps could not acquire the lock
Workflow lock failures cascade: one step timeout fails the entire workflow

Additional note

The lock holder is tracked by PID, but there is no evidence of stale locks (process died while holding). The contention is purely from >1 process queued behind a slow method call like dashboard (~3.5s execution).

webframp commented 6/1/2026, 9:02:31 PM

Retested on version 20260601.163824.0-sha.c2872a24 (15 subagents, 1 hour). Results are essentially unchanged:

Metric	First test	Retest
Lock waits (rapid-fire, 3 agents)	~90 each	~80 each
Lock timeouts (rapid-fire, 3 agents)	~40 each	~35 each
Lock waits (zero-sleep, 1 agent)	110	99
Lock timeouts (zero-sleep, 1 agent)	45	38

The per-model lock is still a hard bottleneck. Data operations (swamp data list/query/search) confirmed zero contention — they use a different lock domain.

Swamp version 163824.0 does not appear to contain a fix for this issue.

Should be able to see all the issues I created by a filter "submitted by me"

Ability to change the email address associated with my Swamp Club Account

feat: giga-swamp phase 5 — CLI output + namespace management commands

CI review jobs use two-dot diff that includes files the PR never touched

paths.base: manifest is not honored for workflows: — bundled workflows only resolve from repo root, blocking self-contained subdir layouts (sibling to #459)

Lab profanity filter rejects legitimate CLI flag tokens via substring match

Sign and notarize the swamp macOS binary

Add platform type to issue-lifecycle extension model Zod schema

fix: datastoreSetupExtension() ignores namespace config on initial migration push/pull

Remote execution: orchestrator/worker fan-out (replaces execution drivers)

swamp datastore sync --push creates global .datastore-index.json ignoring namespace config

feat: S3/GCS extension namespace-scoped sync support

Copy explicitGlobalArgs before mutation in resolveOrCreateDefinition

vault.get() expressions in extension model globalArguments are not resolved at runtime

swamp-issue skill should scrub secrets and org-specific data before submission

workflow validate: trim stale 'skipped' label from model_not_found warning

Add pi coding agent support

hashicorp-vault should read token from env

swamp-extension adversarial review skill needs mandatory mechanical verification checklist

feat: giga-swamp phase 6 — Namespace-scoped sync

swamp workflow validate emits misleading "Extension failed to load" warning when type resolves locally

Add issue search/list command to discover existing issues

Support vault-resolved private key content in transport auth (not just file paths)

Workflow engine resolves extension methods against base type, ignoring extension-registered methods

Per-model LockTimeoutError at 60s causes cascading failures under concurrent access

Persistent, queryable workflow runs (status / cancel from any shell)

swamp repo upgrade: ERR_SQLITE_ERROR 'attempt to write a readonly database' during extension catalog schema migration

workflow validate: fail on references to unknown model instances (typo'd modelIdOrName)

feat: giga-swamp phase 4 — CEL cross-namespace queries

Docs: document the extension push adversarial-review gate

vault://local_encryption token does not round-trip correctly for GCP OAuth2 access tokens

swamp issue: add ability to edit issue title and body after submission

@swamp/gcp/iam: add WIF pool, provider, service account, and binding support

Support vault-sourced identity keys

copy method reports success when scp exits non-zero (e.g. 255)

Docs: TLS behind inspecting proxies / private CAs (system trust store, DENO_CERT, SSL_CERT_FILE)

Extension quality/adversarial-review: add a 'published-surface hygiene' check for real infra identifiers

Feed-post scoring is a direct domain write, not a consumer of feed_post_approved telemetry

workflow validate silently PASSES steps whose model type is a pulled extension (step-inputs skipped = false pass)

extension quality fails to resolve bare specifiers — contradicts fmt no-import-prefix rule

Allow global arguments in direct type execution (workflow fan-out)

Bundled Deno does not honor the OS/system CA trust store

Gator-approved feed post did not trigger Discord activity or profile points

username_metrics projection backfill does not trigger re-scoring (stale UserScore for dormant users)

Enforce adversarial review gate before extension push

support git forge / web namespaces for collectives

Report type filtering in report search

extension search: empty results from CLI despite known extensions

workflow approve/resume cannot find suspended runs

vault annotate --url fails with query params on @swamp/aws-sm

datastore compact VACUUM fails with ERR_SQLITE_ERROR

workflow approve/resume cannot find suspended run when using S3 datastore

reindexByUsername re-strands pre-association history and wipes sign_in_dates

Telemetry never retroactively credits a device's pre-association history

Docs: document swamp doctor secrets in manual reference doctor.md

Docs: document 'swamp workflow resume --input' in manual reference

Cloudflare codegen: manifest version bumps on every regeneration (README not deno-fmt-clean)

Support dynamic host discovery from external sources

feat: giga-swamp phase 3 — Path resolver + per-namespace locking

@swamp/ssh exec: string host selector only matches 'all', ignores host names and tags

Add integration test for sensitive-arg guard on lazily-loaded extension types (follow-up to #480)

Remediate existing definitions holding cleartext sensitive global arguments (follow-up to #480)

Docs: document refusal of literal sensitive global arguments (follow-up to #480)

Docs: update extension-trust reference for swamp-only default + lockfile version pinning (swamp-club#465)

feat: giga-swamp phase 2 — Catalog schema v4 + repository interface

Support for Custom CA's

Cloudflare: support vault expressions for API credentials instead of env-var-only auth

GCP: support vault expressions for credentials instead of env-var-only auth

AWS: support vault expressions for credentials instead of env-var/SDK-chain-only auth

DigitalOcean: support vault expressions for the API token instead of DO_API_TOKEN env var

swamp model get does not redact `sensitive: true` fields (logs/reports/storage do)

Support vault expressions for API token instead of env var

UAT tests for manual_approval workflow commands

Document manual_approval workflow step type and suspend/approve/resume flow

Stale extension bundles break after swamp upgrade

Support --input flags on workflow resume for elevated permissions and runtime overrides

Add HTTP approval endpoints to swamp serve for manual_approval steps

feat: giga-swamp phase 1 — Namespace value object + config

swamp serve scheduled workflows do not load repo extension registries

ci: aws-check and gcp-check jobs take ~30min — rethink whether full model type-checking is needed per PR