datastore sync: clean up zombie _catalog.db* entries from remote index and S3 bucket
Opened by stack72 · 4/8/2026
Summary
After swamp-club #29 is fixed (exclude _catalog.db* from the sync walker), users who hit the bug before the fix will have two kinds of leftover pollution in their remote S3/GCS state:
- Orphan objects in the bucket — physical S3/GCS objects at keys like
data/_catalog.db,data/_catalog.db-shm,data/_catalog.db-walfrom the broken push attempts. - Zombie entries in the remote
.datastore-index.json— index records pointing at those orphan objects.
The #29 fix handles these passively — the pull-side filter skips them on read so they cause no new harm — but it does not clean them up. That's intentional to keep the #29 blast radius small. This issue tracks the cleanup as a separable follow-up.
Why the passive skip is not enough long-term
pushChanged() in s3_cache_sync.ts (and the identical logic in gcs_cache_sync.ts) rewrites .datastore-index.json on every successful push that uploads at least one file. Because the #29 fix only skips the zombie entries at iteration time — it doesn't remove them from this.index.entries — the zombies get re-serialized into the uploaded index indefinitely. They will outlive the bug forever unless explicitly cleaned.
The orphan S3/GCS objects themselves are cosmetically unpleasant but functionally harmless once nothing references them.
Suggested approaches
Passive index cleanup (low risk, recommended first step):
In loadIndex(), strip any entry where the relative path matches the _catalog.db* pattern (reuse the isInternalCacheFile helper that #29 introduces). Add an indexMutated flag on the service instance; set it to true whenever loadIndex removes zombies. Modify the pushChanged write-back guard from if (pushed > 0 && this.index) to if ((pushed > 0 || this.indexMutated) && this.index) so the cleanup propagates on a no-op push too.
Effect: on any --push or --sync against a polluted bucket, the index gets rewritten without the zombies. Within one sync cycle, any client pulling from the bucket sees a clean index. Pure bookkeeping — no destructive operations.
Active remote cleanup (separate, higher-risk follow-up):
Add a swamp datastore sync --clean-orphans flag that walks the remote bucket and issues deleteObject for any key matching the internal-file pattern. Destructive, so it should be opt-in and log every deletion. This actually reclaims S3/GCS storage, whereas passive cleanup only fixes the index.
Alternative: do the delete automatically in loadIndex alongside the passive cleanup. Simpler but removes the user's ability to audit what gets deleted. Not recommended — the swamp convention is to avoid destructive ops without explicit user intent.
Documentation-only:
Add a note to the @swamp/s3-datastore and @swamp/gcs-datastore manifest descriptions and/or release notes telling affected users how to manually delete the orphan objects with aws s3 rm / gsutil rm. Zero code risk but leaves historical users to figure it out.
Scope
Applies identically to both @swamp/s3-datastore and @swamp/gcs-datastore (same code pattern in s3_cache_sync.ts and gcs_cache_sync.ts).
Testing requirements
- Unit test: load an index containing both legitimate entries and
_catalog.db*zombies; assertthis.index.entriesno longer contains the zombies afterloadIndex(). - Unit test:
pushChanged()on a cache with no new files but a mutated index writes the cleaned index back to S3/GCS (asserts theindexMutatedflag path). - Unit test: multiple
loadIndex()calls are idempotent (don't re-setindexMutatedif the index is already clean). - If implementing
--clean-orphans, add a CLI UAT test in swamp-uat.
Related
- swamp-club issue #29 (primary bug: `_catalog.db-wal` push failure)
- swamp-club issue #30 (double `pushChanged` call in --push)
Closed
No activity in this phase yet.
stack72 commented 4/8/2026, 7:43:07 PM
Partially resolved by #29 (passive index cleanup folded into the scrubIndex + indexMutated mechanism). Physical orphan object cleanup is accepted as a documented trade-off per release notes — not worth a dedicated --clean-orphans command. Closing as superseded.
Sign in to post a ripple.