swamp datastore sync is minutes-slow at 4k-file scale even with zero-diff; outer 300s timeout fires
Opened by swamp_lord · 4/24/2026· Shipped 4/24/2026
Summary
After today's release (CLI 20260424.000956.0-sha.25668e08, @swamp/s3-datastore 2026.04.24.1), swamp datastore sync is unusably slow on a real-size repo (~4k files tracked) even when local cache and remote index are identical (zero-diff). The initial pull/walk phase runs for ~5 minutes before push has a chance to do anything, and the outer 300s timeout then fires. Isolation-test repo with 14 files syncs in ~1 second under the same CLI + datastore version.
The Deno TLS panic from lab/157 is fixed — thank you. The failure mode is now a clean Datastore push timed out after 300000ms message instead of tls_wrap.rs:1918 Option::unwrap on None + exit 134. But the sync itself can't complete at our production scale, so we can't actually use the repo.
Environment
- swamp CLI:
20260424.000956.0-sha.25668e08 @swamp/s3-datastore:2026.04.24.1@swamp/digitalocean:2026.04.08.1- Datastore: DO Spaces, bucket
giga-swamp, prefixswamp-club, regionsfo3, endpointhttps://sfo3.digitaloceanspaces.com - Local: macOS, repoId
b1b78ea0-7a12-47a4-888e-e99db597d49c - Scale:
4034entries in.datastore-index.json(~1.37 MB index),4034files in local cache (post-cleanup, identical)
Reproduction
- Any swamp repo with a large-ish S3-backed datastore (tested at 4034 entries).
- Ensure local cache is byte-identical to the remote index (I did this by deleting 192 local-only "cruft" files so the diff went to zero).
- Run
swamp datastore sync. - Expected: fast no-op, since there is nothing to push or pull.
- Actual: sync sits in pull/walk for ~5 min, finally logs
Pushing changes to "@swamp/s3-datastore"...at ~t=295s, then the outer 300s timeout fires and errors out:
Still syncing... (295s)
Still syncing... (300s)
16:59:51.734 INF datastore·sync Pushing changes to "@swamp/s3-datastore"...
Still syncing... (305s)
Still syncing... (310s)
16:54:03.673 FTL error Error: `Datastore push to "@swamp/s3-datastore" timed out after 300000ms...`Also repros with swamp extension install (which implicitly syncs) and swamp workflow run <anything> (pushes the workflow's output at the end). Also repros against a freshly-migrated swamp-club-v2 prefix (server-side copied from the original, identical index, identical local state) — so it is not tied to the prefix or to any corrupt state specific to swamp-club.
Isolation control (same CLI + datastore versions) on a fresh repo with 14 files: Pushed 14 file(s) to datastore in ~1 second. So the perf cliff is scale-related, not client-misconfigured.
What I've already ruled out
- Corrupt or format-mismatched index: the
.datastore-index.jsonis{ version: 1, lastPulled, entries: {…} }— same shape as the fresh isolation repo's. No entries reference anything weird. - Stuck lock:
swamp datastore lock statusreturnsnullbetween attempts; releasing it does not change the behavior. - Local cache vs index drift: confirmed via a Python script that
local_files - index_keys == set()andindex_keys - local_files == set()before the run. - DO Spaces endpoint health: parallel
aws s3 ls/head-objectagainst the same bucket/prefix complete in <1s. - Prefix-specific state: migrated to a brand-new
swamp-club-v2prefix via server-side copy, same symptom.
Hypothesis
The slow phase appears to be a serial per-entry HEAD (or some equivalent) against the 4034-entry index during the pull/walk step. That would explain the ~5-min wall clock (≈ 75 ms / entry × 4000). A batched list or a short-circuit "local matches index, nothing to do" path would fix it.
Related
- lab/157 — original TLS panic, now fixed; this is the next layer uncovered by that fix.
- Factory pattern landed successfully: systeminit/giga-swamp#9. End-to-end deploys work on DO once a sync completes; they just can't complete locally at production repo size.
Shipped
Click a lifecycle step above to view its details.
Sign in to post a ripple.