#252 Implement W3: ReconcileFromDisk + freshness-as-aggregate-query (extension catalog rearchitecture)
Opened by stack72 · 5/5/2026· Shipped 5/6/2026
Problem
W1a (#1292), W1b (#1295), the LockfileRepository prequel (#1298), and W2 (#231 / closing swamp-club#201) put the structural foundation in place: domain aggregate, ExtensionRepository, lifecycle services, asymmetric unit-of-work, atomic upgrade pattern. W2 ships the first user-visible behavior changes.
But the bundling-decision-correctness woes remain. The freshness contract is still implicit and additive — five-and-a-half states in flight (fresh, stale-fingerprint-differs, stale-no-row, validation_failed-but-fresh, plus UNREADABLE_DEP_SENTINEL-encoded "broken deps stable" from PR #208) — and findStaleFiles is doing duty for both deletion-sweep and stale-file detection. Each new bundling bug has produced a column or sentinel patch (#208, #209, #212 — the audit's "bug class signature"). The structural fix lands in W3.
W3 introduces the ReconcileFromDisk service and rewrites the freshness contract as a pure function of aggregate state. The 5+ implicit states collapse into the explicit RowState discriminant queries (already enumerated in W1b). UNREADABLE_DEP_SENTINEL disappears.
Full architectural context: design/extension-rearchitecture.md ("W3 — ReconcileFromDisk service + freshness as aggregate query" section) — referenced from #211.
Scope
Phase 1 — ReconcileFromDisk.execute(repo)
New application service in src/libswamp/extensions/ that:
- Walks the on-disk source tree (locals + pulled-extensions)
- Loads current aggregate state via
ExtensionRepository.loadAll() - Diffs the two and emits transitions:
- On-disk source present, aggregate has
IndexedSource → no-op - On-disk source present, aggregate has no Source → install path (delegate to
InstallExtensionServiceor the loader'sbundleAndIndexOne) - On-disk source absent, aggregate has
IndexedSource →markSourceMissing(→OrphanedBundleOnlyif bundle present, elseTombstoned) - Entry-point unreadable →
recordEntryPointUnreadable - Bundle build failure →
recordBundleBuildFailed
- On-disk source present, aggregate has
- Saves resulting aggregate state via
repository.saveAll([...])
This replaces the implicit "next buildIndex pass will reap orphans" contract that W2's lifecycle services currently rely on as a fallback safety net.
Phase 2 — Freshness contract as aggregate query
Today's src/domain/extensions/bundle_freshness.ts reads source fingerprints + state + bundle paths, returns "fresh" / "stale". W3 replaces this with a pure function of RowState:
Indexed→ fresh; type resolution returns this SourceBundled | ValidationFailed | BundleBuildFailed | EntryPointUnreadable | OrphanedBundleOnly | Tombstoned→ not visible to type resolution; reconcile may transition
The 5+ implicit states all become explicit RowState tags, set by the appropriate transition methods on the aggregate.
Phase 3 — UNREADABLE_DEP_SENTINEL removal
The sentinel was added in PR #208/#1282 to break a rebundle loop on broken transitive deps. Underlying behavior ("if a transitive dep is unreadable, don't rebundle until something changes") gets absorbed into EntryPointUnreadable and dependency-walking transitions. Sentinel disappears; behavior preserved by regression test.
Pre-work decisions to pin in the PR description
- Service location. Recommend
src/libswamp/extensions/(alongside W2 services). Application service orchestrating domain + infrastructure. - Reconcile trigger points. When does reconcile run? Recommend: cold-start (when
invalidationGuardsfire) + explicitswamp doctor extensionscall (when W6 lands). NOT on every command — would dominate hot-path performance. Pin trigger conditions explicitly. findStaleFilesmigration. Delete entirely, or keep as thin shim for W2's fallback safety net? Recommend: keep as a deletion-sweep helper for W2's crash-recovery path; bulk of stale-file detection moves toReconcileFromDisk. Shim becomes ~20 LOC.- Orphan bundle file deletion. Reconcile detects orphans (
OrphanedBundleOnlystate). Does W3 also delete the orphaned bundle file from disk? Recommend: W3 only transitions state, defers actual file eviction to a follow-up tracking issue (bundle cache eviction policy is currently unowned). Keeps W3 scope bounded. - Reconcile interaction with W2's lifecycle services. W2's services use
findStaleFilesas a fallback for crashed mid-flight installs. Boundary: W2's services own the unit-of-work; reconcile owns post-hoc state repair. Pin explicitly so neither workstream's safety claims regress.
Out of scope (deferred to later workstreams)
- Loader unification (
KindAdapter) + swamp-club#214 ENOENT-fallback parity → W4 legacyStoreescape hatch removal → W4- Per-fingerprint import URLs + subprocess test harness → W5
swamp doctor extensionsaggregate-state rendering → W6- Bundle cache file eviction (the actual
Deno.removeof orphaned bundle files) — file as a separate tracking issue; W3 detects orphans only
Success criteria
UNREADABLE_DEP_SENTINELremoved, broken-transitive-dep behavior preserved (regression test reproducing the original swamp-club#208 case).- Schema-invalid extension behavior preserved (regression test for the original swamp-club#209 case — extensions with
safeParsefailures stay inValidationFailed, no rebundle loop). - Cached-bundle-missing rebundle preserved (regression test for swamp-club#212 / #1288).
OrphanedBundleOnlyfires correctly: source file deleted while bundle remains → state transitions; type resolution stops returning this Source.EntryPointUnreadablefires correctly: entry-point fingerprint throws (permission denied, missing) → state transitions; restore + reconcile → recovers toIndexed.- Reconcile is idempotent: running twice in succession produces no UPDATE/DELETE/INSERT.
- Cold-start performance not dominated by reconcile: profile cold-start time before/after on a repo with ≥ 50 extensions; pin a regression threshold (e.g. ≤ 1.2x).
- W2's lifecycle services' fallback safety net still works — crashed mid-flight install gets reconciled correctly either via W2's existing path or W3's strategic reconcile (per pre-work decision 5).
- All existing tests pass on Linux + macOS (Windows not a merge gate per W-series precedent).
- Auto-ship-on-merge readiness verified via diversity-matrix soak.
Suggested test additions
- Regression for #208: broken transitive dep → reconcile → state stays at the new equivalent of "transitive dep unreadable"; no rebundle loop.
- Regression for #209: schema-invalid extension → reconcile → state stays at
ValidationFailed; no rebundle loop. - Regression for #212: cached bundle missing → reconcile → rebundle fires once, not in a loop.
OrphanedBundleOnlytransition: source deleted, bundle present → reconcile → state transitions; type resolution returns nothing.Tombstonedtransition: source AND bundle deleted → reconcile → state transitions.EntryPointUnreadabletransition: chmod 000 entry point → reconcile → state transitions; restore permissions → reconcile → recovers toIndexed.- Idempotence: run reconcile twice; second run produces zero catalog mutations.
- Concurrent reconcile +
InstallExtensionService: ensure no race window where reconcile sees an in-flight install and produces a spurious transition (the lockfile race window deferred from W1b's ADV-8 — pin behavior here). - Performance: cold-start time on a repo with ≥ 50 extensions; compare against pre-W3 baseline; assert ≤ 1.2x regression.
Auto-ship-on-merge constraint
Same gates as W2:
- CI green (all new + existing tests + type-check + lint + fmt)
- Author smoke on real repo: cold-start works, reconcile produces expected transitions, no spurious rebundles
- Reviewer smoke on a different real repo
- Diversity-matrix soak (multiple machines × OS × install shape × workflow × repo size)
- Specifically watch for: cold-start perf regressions, unexplained state transitions, rebundle loops on edge-case extensions
- No
UNREADABLE_DEP_SENTINEL-class regression (the original #208 case stays fixed) - Forward-only revert posture documented
Cost of escape is "every user on next pull" — bar is "would I let this go to all users tomorrow."
Push-back encouraged
If the design doesn't fit the ground, surface it before implementation. Specific watch list:
- The freshness contract claim that 5+ implicit states collapse cleanly into
RowStatemay have edge cases the audit didn't enumerate. If the agent finds a freshness state without a correspondingRowStatetransition, surface before implementation — design doc may need revision before W3 lands. UNREADABLE_DEP_SENTINELremoval might surface unexpected callers. Verify all readers/writers before locking in the removal.- W2's
findStaleFilesfallback is real load-bearing code today (used by W2's crashed-install recovery). Confirm the W3-or-shim decision matches actual usage, not the design doc's idealized version. - Reconcile trigger points affect performance materially. If "cold-start + explicit doctor call" misses a real use case (e.g., long-running daemon needs reconcile too), surface before locking in the trigger model.
References
- Predecessors: #211 (W1 tracking), #223 (W1b), #231 (W2)
- Related bugs (the rebundle-loop class W3 structurally fixes): #208 / #1282, #209 / #1286, #212 / #1288
- Design doc:
design/extension-rearchitecture.md
Shipped
Click a lifecycle step above to view its details.
Sign in to post a ripple.