Skip to main content
← Back to list
01Issue
BugShippedSwamp CLI
Assigneesstack72

swamp extension install: datastore push hangs ~8.5m then crashes with Deno TLS panic (tls_wrap.rs:1918 unwrap on None)

Opened by swamp_lord · 4/23/2026· Shipped 4/23/2026

Summary

swamp extension install hangs for ~8.5 minutes during its datastore·sync Pushing changes to "@swamp/s3-datastore" phase, then panics inside Deno's TLS layer and aborts with exit code 134. Reproduces on every CI run (GitHub Actions ubuntu-latest) once the datastore has any contention history. The panic is not related to how workflows or model methods are structured — it happens in the extension-catalog bootstrap step, before any user-authored workflow or method executes.

Environment

  • swamp CLI: 20260417.220420.0-sha.7181443f
  • Datastore: @swamp/s3-datastore@2026.04.23.1, backed by DigitalOcean Spaces (S3-compatible)
  • Runner: GitHub Actions ubuntu-latest, running swamp installed from https://swamp.club/install.sh
  • Repo: systeminit/giga-swamp (swamp-managed repo with 4034 files in the datastore), cloned fresh into each runner
  • Installed extensions at time of crash: @swamp/s3-datastore + @swamp/digitalocean + @swamp/issue-lifecycle

Exact panic signature

thread 'main' (pid) panicked at ext/node/ops/tls_wrap.rs:1918:31:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread 'main' (pid) panicked at library/core/src/panicking.rs:230:5:
panic in a function that cannot unwind
stack backtrace:
   0:     0x5653702ea182 - <unknown>
   1:     0x5653702e8bef - <unknown>
   ... (frames all <unknown> — stripped release build) ...
thread caused non-unwinding panic. aborting.
Aborted (core dumped) swamp extension install
exit code 134

Reproduction (observed in CI 4+ times today)

From the GitHub Actions log for a Deploy discord-bot matrix child (run 24858167352, job id 72776548723):

20:50:34.336 INF datastore·sync Syncing from "@swamp/s3-datastore"...
20:50:59.857 INF datastore·sync Synced 4034 file(s) from "@swamp/s3-datastore"
20:50:59.876 WRN extension·install 1 extension(s) pending migration. Run 'swamp repo upgrade' to complete.
20:50:59.877 INF extension·install Reading lockfile...
20:50:59.878 INF extension·install Installing "@swamp/digitalocean"@"2026.04.08.1"...
20:51:02.663 INF extension·install Installing "@swamp/issue-lifecycle"@"2026.04.20.1"...
20:51:04.540 INF extension·install Installed 2 extension(s).
20:51:04.541 INF extension·install 1 extension(s) already up to date.
20:51:04.541 INF datastore·sync Pushing changes to "@swamp/s3-datastore"...
                                                              ← 8m 25s of silence ←
20:59:29.335 thread 'main' (2225) panicked at ext/node/ops/tls_wrap.rs:1918:31:
20:59:29.335 called `Option::unwrap()` on a `None` value
20:59:29.336 panic in a function that cannot unwind
20:59:32.462 ##[error]Process completed with exit code 134.

Steps to reproduce

  1. Set up a swamp repo against DO Spaces with the S3 datastore (SWAMP_DATASTORE=s3:<bucket>/<prefix>).
  2. Run swamp extension install once to populate the catalog.
  3. Run swamp extension install a second time from a different host/process while the first one's datastore push is in flight — or run several back-to-back CI jobs that each call swamp extension install at the start.
  4. Observe: one invocation wins; subsequent invocations hang on datastore·sync Pushing changes for ~8.5 minutes and then panic in tls_wrap.rs:1918:31 with exit 134.

Hits at least 3 of 3 recent GitHub Actions matrix deploys even with max-parallel: 1 on the job matrix (i.e. strictly sequential runs, not parallel) — suggesting the S3 datastore retains some state (orphan lock or stuck conditional-write race) across invocations that produces the long retry loop which in turn trips the TLS panic.

What I've ruled out

  • AWS / DO Spaces credentials: GET on the lock object succeeds and returns valid holder metadata (hostname, pid, nonce, acquiredAt, ttlMs). Bad creds produce InvalidAccessKeyId / AccessDenied, not a lock-retry loop.
  • GitHub Actions concurrency config: serialized jobs with a single shared concurrency.group, cancel-in-progress: false, max-parallel: 1 on the matrix. Only one swamp CLI invocation runs at a time. Bug still reproduces.
  • User-authored workflow or method shape: this is in the Populate extension catalog step, which is swamp extension install && swamp model type describe …. It never reaches the user workflow. See "Factory pattern context" below.

Factory pattern context (this bug is not about it — included so the unrelatedness is clear)

This repo's deploy flow follows rule 6 of the swamp-extension-model skill — a single factory method on @swamp/digitalocean/app-platform called deploy_components does docker login + build + push + spec PUT + deployment poll + per-component log tail in one invocation, so the app-platform model lock is acquired exactly once per deploy. The CI workflow is swamp workflow run deploy-<service> --input tag=<sha>, which dispatches a one-step workflow whose single model_method step calls deploy_components(deployments: [{component, tag, buildContext, ...}]).

Confirmed working: a local swamp workflow run deploy-swamp-serve --input tag=<...> against the same datastore went fully green end-to-end — docker login → build → push → spec PUT → deployment ACTIVE in ~45 seconds → log tail captured the new container's startup lines. No issues reaching or executing the factory method.

The panic in CI happens strictly before the factory method is reached, inside swamp extension install's datastore·sync Pushing changes phase. So the factory refactor did the right thing at the user layer; it's the CLI's catalog-install bootstrap that's fragile under even mild datastore contention.

What I'd like

  1. Bounded retry / timeout on the datastore·sync Pushing changes path — it currently appears to retry silently for 8+ minutes before tripping the TLS panic. A clean exit with LockTimeoutError (or similar) after, say, 60-120s would turn this from a Deno core dump into a normal error we can react to and retry idempotently.
  2. Root-cause the tls_wrap.rs:1918:31 Option::unwrap on None — long-lived TLS connections shouldn't be reaching an Option::None unconditionally. Likely some state machine transition invariant is being broken by connection reuse under long retry loops.
  3. If the real culprit is an S3 conditional-write retry loop against an orphaned or stale lock, document the swamp datastore lock release --force escape hatch more prominently in operational docs — it's the only way out when this state is hit, and finding it from the error message alone required grepping swamp help.
  • Deploy telemetry in run 24856445353: hangs at datastore push → TLS panic (~8.5 min)
  • Deploy discord-bot in run 24858167352 (just now): same pattern, ~8.5 min hang → panic
  • Earlier in the day a GitHub-Actions-held orphan lock needed swamp datastore lock release --force to clear — the CLI's own "stale TTL" auto-steal logic repeatedly detected the lock as stale but could not actually steal it, livelock on Global lock acquired ... — releasing and retrying.

Happy to provide raw full job logs if useful.

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 1 MOREASSIGNED+ 8 MOREREVIEW+ 3 MOREPR_MERGEDSHIPPED

Shipped

4/23/2026, 11:52:20 PM

Click a lifecycle step above to view its details.

03Sludge Pulse
stack72 assigned stack724/23/2026, 9:08:49 PM

Sign in to post a ripple.