swamp extension install: datastore push hangs ~8.5m then crashes with Deno TLS panic (tls_wrap.rs:1918 unwrap on None)
Opened by swamp_lord · 4/23/2026· Shipped 4/23/2026
Summary
swamp extension install hangs for ~8.5 minutes during its datastore·sync Pushing changes to "@swamp/s3-datastore" phase, then panics inside Deno's TLS layer and aborts with exit code 134. Reproduces on every CI run (GitHub Actions ubuntu-latest) once the datastore has any contention history. The panic is not related to how workflows or model methods are structured — it happens in the extension-catalog bootstrap step, before any user-authored workflow or method executes.
Environment
- swamp CLI:
20260417.220420.0-sha.7181443f - Datastore:
@swamp/s3-datastore@2026.04.23.1, backed by DigitalOcean Spaces (S3-compatible) - Runner: GitHub Actions
ubuntu-latest, runningswampinstalled fromhttps://swamp.club/install.sh - Repo:
systeminit/giga-swamp(swamp-managed repo with 4034 files in the datastore), cloned fresh into each runner - Installed extensions at time of crash:
@swamp/s3-datastore+@swamp/digitalocean+@swamp/issue-lifecycle
Exact panic signature
thread 'main' (pid) panicked at ext/node/ops/tls_wrap.rs:1918:31:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'main' (pid) panicked at library/core/src/panicking.rs:230:5:
panic in a function that cannot unwind
stack backtrace:
0: 0x5653702ea182 - <unknown>
1: 0x5653702e8bef - <unknown>
... (frames all <unknown> — stripped release build) ...
thread caused non-unwinding panic. aborting.
Aborted (core dumped) swamp extension install
exit code 134Reproduction (observed in CI 4+ times today)
From the GitHub Actions log for a Deploy discord-bot matrix child (run 24858167352, job id 72776548723):
20:50:34.336 INF datastore·sync Syncing from "@swamp/s3-datastore"...
20:50:59.857 INF datastore·sync Synced 4034 file(s) from "@swamp/s3-datastore"
20:50:59.876 WRN extension·install 1 extension(s) pending migration. Run 'swamp repo upgrade' to complete.
20:50:59.877 INF extension·install Reading lockfile...
20:50:59.878 INF extension·install Installing "@swamp/digitalocean"@"2026.04.08.1"...
20:51:02.663 INF extension·install Installing "@swamp/issue-lifecycle"@"2026.04.20.1"...
20:51:04.540 INF extension·install Installed 2 extension(s).
20:51:04.541 INF extension·install 1 extension(s) already up to date.
20:51:04.541 INF datastore·sync Pushing changes to "@swamp/s3-datastore"...
← 8m 25s of silence ←
20:59:29.335 thread 'main' (2225) panicked at ext/node/ops/tls_wrap.rs:1918:31:
20:59:29.335 called `Option::unwrap()` on a `None` value
20:59:29.336 panic in a function that cannot unwind
20:59:32.462 ##[error]Process completed with exit code 134.Steps to reproduce
- Set up a swamp repo against DO Spaces with the S3 datastore (
SWAMP_DATASTORE=s3:<bucket>/<prefix>). - Run
swamp extension installonce to populate the catalog. - Run
swamp extension installa second time from a different host/process while the first one's datastore push is in flight — or run several back-to-back CI jobs that each callswamp extension installat the start. - Observe: one invocation wins; subsequent invocations hang on
datastore·sync Pushing changesfor ~8.5 minutes and then panic intls_wrap.rs:1918:31with exit 134.
Hits at least 3 of 3 recent GitHub Actions matrix deploys even with max-parallel: 1 on the job matrix (i.e. strictly sequential runs, not parallel) — suggesting the S3 datastore retains some state (orphan lock or stuck conditional-write race) across invocations that produces the long retry loop which in turn trips the TLS panic.
What I've ruled out
- AWS / DO Spaces credentials: GET on the lock object succeeds and returns valid holder metadata (hostname, pid, nonce, acquiredAt, ttlMs). Bad creds produce
InvalidAccessKeyId/AccessDenied, not a lock-retry loop. - GitHub Actions concurrency config: serialized jobs with a single shared
concurrency.group,cancel-in-progress: false,max-parallel: 1on the matrix. Only one swamp CLI invocation runs at a time. Bug still reproduces. - User-authored workflow or method shape: this is in the Populate extension catalog step, which is
swamp extension install && swamp model type describe …. It never reaches the user workflow. See "Factory pattern context" below.
Factory pattern context (this bug is not about it — included so the unrelatedness is clear)
This repo's deploy flow follows rule 6 of the swamp-extension-model skill — a single factory method on @swamp/digitalocean/app-platform called deploy_components does docker login + build + push + spec PUT + deployment poll + per-component log tail in one invocation, so the app-platform model lock is acquired exactly once per deploy. The CI workflow is swamp workflow run deploy-<service> --input tag=<sha>, which dispatches a one-step workflow whose single model_method step calls deploy_components(deployments: [{component, tag, buildContext, ...}]).
Confirmed working: a local swamp workflow run deploy-swamp-serve --input tag=<...> against the same datastore went fully green end-to-end — docker login → build → push → spec PUT → deployment ACTIVE in ~45 seconds → log tail captured the new container's startup lines. No issues reaching or executing the factory method.
The panic in CI happens strictly before the factory method is reached, inside swamp extension install's datastore·sync Pushing changes phase. So the factory refactor did the right thing at the user layer; it's the CLI's catalog-install bootstrap that's fragile under even mild datastore contention.
What I'd like
- Bounded retry / timeout on the
datastore·sync Pushing changespath — it currently appears to retry silently for 8+ minutes before tripping the TLS panic. A clean exit withLockTimeoutError(or similar) after, say, 60-120s would turn this from a Deno core dump into a normal error we can react to and retry idempotently. - Root-cause the
tls_wrap.rs:1918:31 Option::unwrap on None— long-lived TLS connections shouldn't be reaching anOption::Noneunconditionally. Likely some state machine transition invariant is being broken by connection reuse under long retry loops. - If the real culprit is an S3 conditional-write retry loop against an orphaned or stale lock, document the
swamp datastore lock release --forceescape hatch more prominently in operational docs — it's the only way out when this state is hit, and finding it from the error message alone required greppingswamp help.
Related runs (public — systeminit/swamp-club)
Deploy telemetryin run 24856445353: hangs at datastore push → TLS panic (~8.5 min)Deploy discord-botin run 24858167352 (just now): same pattern, ~8.5 min hang → panic- Earlier in the day a GitHub-Actions-held orphan lock needed
swamp datastore lock release --forceto clear — the CLI's own "stale TTL" auto-steal logic repeatedly detected the lock as stale but could not actually steal it, livelock onGlobal lock acquired ... — releasing and retrying.
Happy to provide raw full job logs if useful.
Shipped
Click a lifecycle step above to view its details.
Sign in to post a ripple.