Skip to main content
← Back to list
01Issue
BugShippedSwamp CLI
Assigneesstack72

#334 Fix invalidate-then-reconcile sequencing in doctor extensions; failure-mode RowStates unreachable in sourceDetails

Opened by stack72 · 5/12/2026· Shipped 5/12/2026

Summary

swamp doctor extensions is designed to be the canonical way to inspect the true on-disk state of extension catalog rows. After follow-up work in W3 and W6, it should reliably surface every RowState (Indexed, BundleBuildFailed, ValidationFailed, EntryPointUnreadable, OrphanedBundleOnly, Tombstoned) in aggregateState.sourceDetails[].

In practice, BundleBuildFailed, ValidationFailed, and EntryPointUnreadable are unreachable in sourceDetails[] for the most common user-observable scenario (a previously-Indexed local extension whose on-disk content is replaced with broken content). The row stays Indexed indefinitely. Failures surface only in registries.<kind>.failures[] — a different output shape with different field names — defeating the W6 unified aggregate-state surface for the cases users care about most.

Root cause is a sequencing bug in how doctor extensions invalidates the catalog vs. when reconcile decides whether to fire. Details below.

Reproduction

# Setup: fresh repo with a valid local model extension
swamp repo init
mkdir -p extensions/models/test
cat > extensions/models/test/entry.ts <<'TS'
export const model = {
  type: "@test/example",
  attributes: { foo: { type: "string" } }
};
TS

# First doctor: row reaches Indexed
swamp doctor extensions --json --verbose | jq '.aggregateState.sourceDetails'
# → [{ sourcePath: ".../entry.ts", stateTag: "Indexed", ... }]

# Replace source with regex-matching but schema-invalid content
cat > extensions/models/test/entry.ts <<'TS'
export const model = "not an object";
TS

# Second doctor: expected BundleBuildFailed in sourceDetails. Observed:
# stays Indexed. Failure (if any) appears in registries.model.failures[].
swamp doctor extensions --json --verbose | jq '.aggregateState.sourceDetails'
# → [{ sourcePath: ".../entry.ts", stateTag: "Indexed", ... }]   ❌

Same symptom with unresolvable imports (import { x } from './nonexistent.ts'; export const model = x;) and with deleted source files for local origin.

Root cause analysis

The reconcile path (ReconcileFromDiskService) and the legacy buildIndex path coexist after W3 shipped. Reconcile is the canonical failure-recording mechanism (writes through the Extension aggregate, hits I-Repo-1, surfaces in sourceDetails[]). Legacy buildIndex is the fallback (writes to result.failed → registries.<kind>.failures[]).

Which one runs is gated by repository.anyKindNeedsInvalidation() evaluated at process startup in src/cli/mod.ts:291-303:

// mod.ts startup — runs once, before any command handler
if (
  repository.anyKindNeedsInvalidation() ||
  repository.manifestIdentityChanged(localManifestIdentity)
) {
  const reconciler = new ReconcileFromDiskService({...});
  await reconciler.execute();   // ← reconcile fires here, or not at all
}

doctor extensions then runs its own invalidation in the handler at src/cli/commands/doctor_extensions.ts:163-180:

// AFTER startup. Too late.
const rescanRepo = new ExtensionRepository({...});
try {
  rescanRepo.invalidateAll();   // clears isPopulated flags
} finally {
  rescanRepo.close();
}
// Then registries.ensureLoaded() → triggers buildIndex

The sequence within a single swamp doctor extensions invocation:

  1. Process startup (mod.ts:291): catalog is populated from prior run. anyKindNeedsInvalidation() returns false → reconcile is skipped.
  2. Doctor handler (doctor_extensions.ts:174): calls invalidateAll() on a separate rescanRepo. Clears isPopulated flags after reconcile has already decided not to run.
  3. ensureLoaded per registry: enters ExtensionLoader.buildIndex (extension_loader.ts:237+). Sees isPopulated === false, falls through to the full-bootstrap path (this.load(...) at line 318), which uses findStaleFiles — a different freshness check from reconcile's fingerprint comparison.
  4. buildIndex then re-marks the catalog populated during its run.

Net effect: invalidate within a single doctor command has no effect on the reconcile-write path. It only swaps in the legacy buildIndex write-path within the current invocation, and the re-population at step 4 even defeats the intended next-invocation effect (subject to verification — see acceptance criteria below).

The user-visible result: failure modes that should produce BundleBuildFailed rows go through buildIndex instead, land in registries.<kind>.failures[], and the source row in sourceDetails[] stays at Indexed (its last successful state).

Proposed fix

The intent is clearly: "doctor extensions should reflect on-disk reality, recomputing fingerprints and re-evaluating bundle validity for every source." Three implementation options, in increasing order of architectural ambition:

Option A — Drive reconcile synchronously from the doctor handler

In doctor_extensions.ts, after invalidateAll(), explicitly construct and run a ReconcileFromDiskService instance against the same repository the loaders are using:

rescanRepo.invalidateAll();
const reconciler = new ReconcileFromDiskService({
  denoRuntime, repository, lockfileRepository, repoDir, localManifestIdentity
});
await reconciler.execute();
// THEN registries.ensureLoaded() — which will now see the post-reconcile catalog

Pros: smallest change. Doesn't touch the legacy buildIndex path. Doctor becomes the one command that authoritatively re-evaluates everything.

Cons: doctor is now subtly slower (always reconciles). The reconcile + buildIndex combination needs careful ordering to avoid double-bundling.

Option B — Move the invalidate before startup

Restructure so doctor extensions (and any future commands that want forced rescan) sets a marker before the swamp process initializes its repository, so mod.ts:291 sees anyKindNeedsInvalidation() === true and fires reconcile at startup.

Pros: aligns with the existing single-trigger-point design. No new synchronization between handlers and startup.

Cons: requires either pre-startup state (a flag file, an env var) or restructuring the CLI to defer repository setup until after argument parsing. Likely a bigger blast radius.

Option C — Eliminate the dual path entirely (W7)

Migrate buildIndex's freshness/rebundle/failure-recording entirely to ReconcileFromDiskService. buildIndex becomes a thin wrapper that just calls reconcile and registers types from the resulting catalog. Failures land in one place (sourceDetails[]); registries.<kind>.failures[] is either deprecated or kept as a denormalized view.

Pros: closes the architectural debt class entirely. Hugely simplifies the mental model. Makes the UAT matrix tractable for failure-mode states.

Cons: large refactor. Should be a workstream of its own ("W7 — unify failure recording through the aggregate"), not a tactical fix.

Recommendation

Ship Option A now, file Option C as a follow-up workstream.

Option A unblocks the UAT failure-mode matrix immediately, is a small focused change with clear acceptance criteria, and doesn't preclude Option C later. Option B is more invasive than Option A without buying anything extra. Option C is the right destination but shouldn't gate the UAT work.

Acceptance criteria

  1. The reproduction above produces stateTag: \"BundleBuildFailed\" for the test source row in the second doctor extensions --json --verbose output.
  2. An equivalent reproduction with regex-matching unresolvable-import content (import { x } from './nonexistent.ts'; export const model = x;) produces BundleBuildFailed in sourceDetails[].
  3. An equivalent reproduction with a deleted-source-file on a pulled extension (lockfile entry preserved, file removed) produces EntryPointUnreadable in sourceDetails[].
  4. An equivalent reproduction with a deleted-source-file on a local extension (bundle preserved) produces OrphanedBundleOnly.
  5. The dual-write concern is addressed: any failure recorded in sourceDetails[] MUST NOT also appear in registries.<kind>.failures[] for the same source path. If both paths fire today, the fix must ensure reconcile's write "wins" and the legacy path either short-circuits or reads from the catalog rather than re-bundling.
  6. No regression in existing doctor extensions UAT tests (swamp-uat/tests/cli/extension/state/* on branch extension-uat-phase-1).
  7. The --apply repair path is not regressed (must still only prune Tombstoned rows; OrphanedBundleOnly deferred per W6 decision).

Files an implementing agent should read first

  • src/cli/mod.ts:270-330 — startup reconcile gate, see exactly what conditions trigger reconcile today
  • src/cli/commands/doctor_extensions.ts:140-200 — the misplaced invalidateAll and the registry ensure-load loop
  • src/libswamp/extensions/reconcile_from_disk_service.ts:1-150, 395-510 — reconcile entry point and the per-extension reconcile loop including the try/catch that writes BundleBuildFailed
  • src/domain/extensions/extension_loader.ts:237-330 — the legacy buildIndex path that currently runs after invalidate
  • src/domain/extensions/extension_loader.ts:389-460 — the bundleAndIndexOne regex gate (line 394-397) and the validation-throw path (line 422-426) — both matter for understanding what content reaches the catch block
  • src/domain/extensions/bundle_freshness.ts:262-400findStaleFiles (legacy freshness) and markCatalogValidationFailed (the dead-code validation-write path, see related context below)

The deeper investigation that produced this issue surfaced two adjacent architectural smells. They are NOT in scope for this fix:

  1. recordValidationFailed has zero production callers. The reconcile catch block collapses validation-throw and bundle-throw into recordBundleBuildFailed. So ValidationFailed as a RowState is currently dead in production. Fixing this requires distinguishing the throw source in the catch block — a small change but with semantic implications. Track separately if prioritized.

  2. Tombstoned rows are DELETEd in the same transaction that records the transition (extension_repository.ts:applyDiffForExtension around line 523). So Tombstoned is never observable in sourceDetails[]. The state exists only in ReconcileResult.transitions[], which doctor doesn't currently surface. Either expose transitions in the doctor JSON or document Tombstoned as transient. Track separately.

Both smells were originally captured in earlier issue filings that are now being closed in favor of this one. The smells remain real but they don't block the UAT failure-mode matrix; this sequencing fix does.

Test plan

Integration tests must live in integration/extensions/ and drive the CLI end-to-end (subprocess swamp doctor extensions, not unit-level service invocation). Suggested test cases — one per acceptance criterion 1-4. Use the existing fixture pattern from integration/extensions/doctor_extensions_aggregate_test.ts.

The fix should also enable the swamp-uat agent to remove the temporary ValidationFailed-unreachable / BundleBuildFailed-unreachable shims from tests/cli/extension/state/*_test.ts on branch extension-uat-phase-1. Coordinate the UAT branch rebase as part of landing this fix.

Why this matters for the rearchitecture story

After W1-W6, the public contract of doctor extensions was supposed to be: "this is the one place users go to see the truth about their extensions." Today it isn't, for the failure modes users hit most often. This is the last user-visible loose thread from the rearchitecture sprint.

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 1 MOREASSIGNED+ 7 MOREREVIEW+ 5 MOREPR_MERGEDSHIPPED

Shipped

5/12/2026, 6:33:48 PM

Click a lifecycle step above to view its details.

03Sludge Pulse
stack72 assigned stack725/12/2026, 4:05:44 PM

Sign in to post a ripple.