Skip to main content
← Back to list
01Issue
BugShippedExtensions
AssigneesNone

datastore sync --push fails on _catalog.db-wal: catalog SQLite DB lives inside the S3 sync cache

Opened by stack72 · 4/8/2026· Shipped 4/8/2026

Symptom

Running swamp datastore sync --push against an @swamp/s3-datastore fails fatally with:

Error: Failed to push 1 file(s) to S3: data/_catalog.db-wal
    at S3CacheSyncService.pushChanged (.../s3.js:55427:13)

The [pushChanged] log shows the walker queueing all three SQLite files for upload: data/_catalog.db, data/_catalog.db-shm, data/_catalog.db-wal. The trace also shows pushChanged being called twice on a single --push invocation (see secondary bug below).

Root cause

The catalog SQLite store is documented as local-only — src/infrastructure/persistence/catalog_store.ts says:

The catalog is local-only and excluded from datastore sync. It self-heals by triggering a backfill when missing or not yet populated.

But it isn't actually excluded. Two things break that promise:

  1. The catalog DB is created inside the sync cache. src/infrastructure/persistence/repository_factory.ts builds catalogDbPath as join(dsPath(SWAMP_SUBDIRS.data), "_catalog.db"). For an S3 datastore, that resolves to <cachePath>/data/_catalog.db — right in the middle of files the S3 sync walker enumerates. Compare with _extension_catalog.db, which correctly lives at swampPath(repoDir, "_extension_catalog.db") — outside the sync cache. The two catalogs should follow the same pattern.

  2. The S3 sync skip list is incomplete. S3CacheSyncService.pushChanged() in the @swamp/s3-datastore extension only filters .datastore-index.json, .push-queue.json, and .datastore.lock. Anything else under the cache directory (including _catalog.db*) is walked, queued, and uploaded.

Why the failure surfaces specifically on _catalog.db-wal

catalog_store.ts opens the DB with PRAGMA journal_mode=WAL. The -wal and -shm sidecars are mutable, transient SQLite files that the open DB connection can checkpoint or rewrite at any moment. Between walk() enumerating them and pushFile() reading/uploading them, SQLite can change the file out from under the uploader, and the S3 PUT fails. These files were never meant to leave the local machine — that's the whole point of the "local-only" comment.

Secondary bug: --push runs pushChanged() twice

This explains the two [pushChanged] blocks in the failure trace. They are not retries — they are independent calls.

src/cli/commands/datastore_sync.ts uses requireInitializedRepo (not the read-only variant) for --push, which registers the global datastore sync coordinator. The command then runs datastoreSync(...) with mode: "push", which calls deps.pushSync()syncService.pushChanged() (first call). After cli.parse(args) returns, runCli in src/cli/mod.ts calls flushDatastoreSync(), which calls pushChanged() again via the coordinator (second call).

On the failure path, the first push throws on _catalog.db-wal. Control reaches runCli's catch block, which calls flushDatastoreSync() a second time. Flush calls pushChanged() a third time — the WAL has since been checkpointed, so only the main .db is left. That push succeeds, "Pushed 1 file(s)" prints, and only then does the original error finally surface as the fatal. This is what makes the log ordering so confusing.

Even on the success path, swamp datastore sync --push is doing two full passes over the cache and two S3 push rounds, which is wasted work.

Suggested fix approach

Primary (architectural): Move _catalog.db out of the sync cache. Build its path with swampPath(repoDir, "_catalog.db") in the repository factory, mirroring how _extension_catalog.db is already handled. After this change the catalog cannot enter any sync provider's walk regardless of the datastore type, and the "local-only" guarantee becomes structural rather than dependent on every datastore extension remembering to filter it.

Defense-in-depth (S3 extension): Broaden the skip list in the @swamp/s3-datastore extension's cache sync from a hardcoded equality check on three filenames to a prefix/glob match that also excludes _catalog.db* (and any future sidecar pattern). Even with the architectural fix, this protects third-party datastores that copy this template.

Coordinator dedup: Reconcile the explicit deps.pushSync() inside datastoreSync with flushDatastoreSync()'s push at runCli teardown so a single --push invocation only pushes once. Either the command-level path skips the explicit push and lets the coordinator's flush handle it, or the coordinator skips its flush-time push when an explicit push has already happened in the same invocation.

Environment

  • swamp CLI installed compiled binary
  • Datastore: @swamp/s3-datastore
  • Triggered first time setting up an S3 datastore against an existing repo with a populated _catalog.db
02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPEDTRIAGE+ 8 MOREREVIEW+ 2 MOREPR_LINKEDCOMPLETE

Shipped

4/8/2026, 8:08:09 PM

Click a lifecycle step above to view its details.

03Sludge Pulse

Sign in to post a ripple.