Skip to main content
← Back to list
01Issue
BugClosedSwamp CLI
AssigneesNone

datastore sync: clean up zombie _catalog.db* entries from remote index and S3 bucket

Opened by stack72 · 4/8/2026

Summary

After swamp-club #29 is fixed (exclude _catalog.db* from the sync walker), users who hit the bug before the fix will have two kinds of leftover pollution in their remote S3/GCS state:

  1. Orphan objects in the bucket — physical S3/GCS objects at keys like data/_catalog.db, data/_catalog.db-shm, data/_catalog.db-wal from the broken push attempts.
  2. Zombie entries in the remote .datastore-index.json — index records pointing at those orphan objects.

The #29 fix handles these passively — the pull-side filter skips them on read so they cause no new harm — but it does not clean them up. That's intentional to keep the #29 blast radius small. This issue tracks the cleanup as a separable follow-up.

Why the passive skip is not enough long-term

pushChanged() in s3_cache_sync.ts (and the identical logic in gcs_cache_sync.ts) rewrites .datastore-index.json on every successful push that uploads at least one file. Because the #29 fix only skips the zombie entries at iteration time — it doesn't remove them from this.index.entries — the zombies get re-serialized into the uploaded index indefinitely. They will outlive the bug forever unless explicitly cleaned.

The orphan S3/GCS objects themselves are cosmetically unpleasant but functionally harmless once nothing references them.

Suggested approaches

Passive index cleanup (low risk, recommended first step):

In loadIndex(), strip any entry where the relative path matches the _catalog.db* pattern (reuse the isInternalCacheFile helper that #29 introduces). Add an indexMutated flag on the service instance; set it to true whenever loadIndex removes zombies. Modify the pushChanged write-back guard from if (pushed > 0 && this.index) to if ((pushed > 0 || this.indexMutated) && this.index) so the cleanup propagates on a no-op push too.

Effect: on any --push or --sync against a polluted bucket, the index gets rewritten without the zombies. Within one sync cycle, any client pulling from the bucket sees a clean index. Pure bookkeeping — no destructive operations.

Active remote cleanup (separate, higher-risk follow-up):

Add a swamp datastore sync --clean-orphans flag that walks the remote bucket and issues deleteObject for any key matching the internal-file pattern. Destructive, so it should be opt-in and log every deletion. This actually reclaims S3/GCS storage, whereas passive cleanup only fixes the index.

Alternative: do the delete automatically in loadIndex alongside the passive cleanup. Simpler but removes the user's ability to audit what gets deleted. Not recommended — the swamp convention is to avoid destructive ops without explicit user intent.

Documentation-only:

Add a note to the @swamp/s3-datastore and @swamp/gcs-datastore manifest descriptions and/or release notes telling affected users how to manually delete the orphan objects with aws s3 rm / gsutil rm. Zero code risk but leaves historical users to figure it out.

Scope

Applies identically to both @swamp/s3-datastore and @swamp/gcs-datastore (same code pattern in s3_cache_sync.ts and gcs_cache_sync.ts).

Testing requirements

  1. Unit test: load an index containing both legitimate entries and _catalog.db* zombies; assert this.index.entries no longer contains the zombies after loadIndex().
  2. Unit test: pushChanged() on a cache with no new files but a mutated index writes the cleaned index back to S3/GCS (asserts the indexMutated flag path).
  3. Unit test: multiple loadIndex() calls are idempotent (don't re-set indexMutated if the index is already clean).
  4. If implementing --clean-orphans, add a CLI UAT test in swamp-uat.
  • swamp-club issue #29 (primary bug: `_catalog.db-wal` push failure)
  • swamp-club issue #30 (double `pushChanged` call in --push)
02Bog Flow
OPENTRIAGEDIN PROGRESSCLOSED

Closed

4/8/2026, 7:43:09 PM

No activity in this phase yet.

03Sludge Pulse

stack72 commented 4/8/2026, 7:43:07 PM

Partially resolved by #29 (passive index cleanup folded into the scrubIndex + indexMutated mechanism). Physical orphan object cleanup is accepted as a documented trade-off per release notes — not worth a dedicated --clean-orphans command. Closing as superseded.

Sign in to post a ripple.