#267 Extension layer garbage collection: prune catalog rows + evict orphaned bundles
Opened by stack72 · 5/6/2026
Problem
After W3 ships (#252), reconcile transitions Sources to terminal states (Tombstoned, OrphanedBundleOnly) but does not delete artifacts. Two classes of debris accumulate:
- Catalog rows in
Tombstonedstate — extensions removed viaextension rm, locals deleted from disk, or pre-rearchitecture legacy state. Take upbundle_typesrows; invisible to type resolution; never deleted. - Bundle files in
.swamp/<kind>-bundles/with no catalog reference — left behind when sources are removed, replaced, or rebundled with a new fingerprint. Disk grows over the lifetime of the repo.
Today there is no user-facing way to clean either. The result: the extension layer leaks disk + metadata progressively over the lifetime of a swamp repo.
This issue supersedes swamp-club#251 (which scoped only to catalog-only orphans). The two cleanup classes share the user surface, the mental model, the safety story, and the pre-work decisions — folding them avoids near-duplicate sibling issues.
Scope
Cleanup operations
- Prune catalog rows. Delete rows in
Tombstonedstate (or any terminal state where neither source nor bundle exists on disk). Operates onbundle_typestable. - Evict orphaned bundle files. Delete files in
.swamp/<kind>-bundles/whose paths are not referenced by anybundle_pathin the catalog. - (Conditional) Stale fingerprint version eviction — depends on pre-work decision #1 (bundle file naming convention). If
deno bundleproduces content-addressed names, every fingerprint change leaves a stale artifact and these need eviction too. If names overwrite-on-rebundle, this case doesn't exist.
User surface
swamp doctor extensions repair (or similar subcommand). Default mode is dry-run; explicit --apply to execute.
Logging
Every eviction logs what was removed and why. Structured fields for scripting.
Pre-work decisions to pin in the PR description
- Bundle file naming convention. Verify whether
deno bundleproduces overwriting names or content-addressed names. Eviction policy depends:- Overwrite-on-rebundle → only true orphans (no catalog reference) accumulate.
- Content-addressed → also evict stale fingerprint versions whose fingerprint no longer matches any current
source_fingerprint.
- Eviction trigger. Three options:
- Explicit user command (
swamp doctor extensions repair). Safest. - Synchronous on reconcile (fires when
OrphanedBundleOnlytransitions). Atomic but coupled. - Background sweep (every Nth cold-start). Hands-off but harder to debug. Recommend: explicit user command for v1; revisit synchronous after reconcile is proven stable.
- Explicit user command (
- Hard delete vs trash directory. Recommend hard delete — recovery is "next reconcile rebundles from source." Trash adds complexity for marginal recovery benefit.
- Default mode is dry-run. User must explicitly pass
--applyto delete. Mirrorsgit clean -nsemantics. Safety-first for a destructive operation.
Out of scope
- Cache size reporting (
swamp doctor extensionsshowing cache size) — diagnostic, separate concern. - Compression of bundle files.
- Auto-eviction triggered by disk space pressure.
- Migration of pre-W3 catalog state to new states (W3 already handles this via reconcile).
Success criteria
swamp doctor extensions repair --dry-runlists every cleanup-eligible row + file with reason.swamp doctor extensions repair --applyperforms the cleanup atomically.- Cleanup operations logged with structured fields.
- Test coverage for each cleanup class:
- Tombstoned row pruning
- Orphaned bundle file eviction
- Stale fingerprint version eviction (if applicable per pre-work #1)
- Idempotent: running
--applytwice → second run is a no-op. - All existing tests pass on Linux + macOS (Windows not a merge gate per W-series precedent).
Suggested test additions
- Seed catalog with 5
Tombstonedrows + 3Indexedrows;--apply; assert 5 deleted, 3 remain. - Seed
.swamp/model-bundles/with 10 files (6 referenced by catalog, 4 not);--apply; assert 6 remain, 4 deleted. - Mixed corrupt case: catalog rows whose
bundle_pathreferences files that don't exist (separate from orphan). Pin behavior. - Dry-run safety:
--dry-runfollowed by state check; assert nothing changed. - Idempotence:
--applytwice; second run produces empty cleanup list. - Cross-platform: cleanup paths use
@std/path, never hand-rolled path concatenation.
Auto-ship-on-merge constraint
Standard W-series gates: CI green, author smoke on real repo, reviewer smoke on different repo, diversity-matrix soak. Specific concerns for this issue:
- Destructive operation: dry-run-by-default is a hard requirement. The
--applypath must be tested explicitly with assertions on what survives. - Disk-state recovery: confirm "delete catalog db + reconcile rebuilds" still works post-eviction. Cleanup should never produce a state that reconcile can't recover from.
When to pick this up
After W4 merges. W4's KindAdapter unification might affect bundle file naming or locations; picking this up after W4 stabilizes the target.
If W6 takes the lead on this surface (doctor extensions aggregate-state rendering), this issue could fold into W6's scope rather than being a standalone workstream. Decision deferred to whoever drafts W6.
References
- Supersedes: swamp-club#251 (catalog-only orphan repair — broader scope here)
- Predecessors: #211 / #223 / #231 / #252 (W1+W2+W3 introduced and operationalized the
TombstonedandOrphanedBundleOnlystates) - May be affected by: W4 (loader unification — bundle naming) and W5 (per-fingerprint URLs — bundle naming)
- Design doc:
design/extension-rearchitecture.md(open question: bundle cache eviction)
Closed
No activity in this phase yet.