Skip to main content
← Back to list
01Issue
BugClosedSwamp CLI
AssigneesNone

#332 Extension failure recording has dual write paths (legacy buildIndex vs W3 reconcile)

Opened by stack72 · 5/12/2026

Summary

Discovered during post-W6 UAT investigation: extension bundle/validation failures are recorded via two parallel write paths, and which one fires depends on whether anyKindNeedsInvalidation() triggered W3 reconcile this process. The same on-disk failure can surface in different places in doctor extensions --json --verbose output depending on prior process state.

Verdict from investigation: C (Hybrid with implementation gap) — not user-visible broken behavior today (the W3 rebundle-loop bug class is closed in practice), but hidden architectural debt with non-deterministic observability and a known invariant bypass.

Full investigation report: swamp-uat/ROWSTATE_INVESTIGATION.md (1,918 words, empirically verified).

The two paths

Path 1 — W3 reconcile (intended)

  • Triggered when anyKindNeedsInvalidation() fires (sources changed, guards tripped, etc.)
  • ReconcileFromDiskServiceextension.recordBundleBuildFailed(...)repository.saveAll([...])
  • Surfaces as aggregateState.sourceDetails[].stateTag === \"BundleBuildFailed\"
  • Goes through the Extension aggregate root (I-Repo-1 invariant fires)

Path 2 — legacy buildIndex (fallback)

  • Triggered when reconcile does NOT fire this process
  • Failures captured into result.failed during buildIndex
  • Surfaces as registries.<kind>.failures[] with shape { file, error }
  • Bypasses the Extension aggregate entirely

Same failure, different output shape and different field names, no normalized surface for consumers.

Finding A — recordValidationFailed has zero production callers. Production code writes ValidationFailed rows via bundle_freshness.ts:398 (markCatalogValidationFailed), which calls catalog.upsert({state: 'ValidationFailed'}) directly — bypassing the Extension aggregate. The extension.recordValidationFailed method exists only for tests.

Finding B — Tombstoned rows are unreachable in sourceDetails[]. applyDiffForExtension:523-526 DELETEs Tombstoned catalog rows in the same transaction that records the transition. State is observable only via ReconcileResult.transitions[], which doctor extensions --json doesn't expose. Any test or consumer expecting Tombstoned in sourceDetails[] is checking an unreachable state.

Why this matters

  1. Non-deterministic observability. Tools and tests querying failure state via doctor extensions --json get different shapes depending on process history. Test-authoring agents have to pin reconcile state explicitly to get predictable assertions.
  2. Aggregate invariant bypassed. The markCatalogValidationFailed direct upsert means I-Repo-1 (cross-aggregate uniqueness) is not enforced for that write path.
  3. Architectural docs are wrong. W1b documented 7 RowStates as a uniform surface; reality is 5 reachable in sourceDetails[], 1 transient at construction (Bundled), 1 transient at the persistence layer (Tombstoned).

Proposed resolution

Three discrete pieces of work, in priority order:

  1. Consolidate failure write paths — pick one canonical mechanism (recommend W3 reconcile path since it goes through the aggregate). Migrate buildIndex to use the same path, or document/normalize the registries.<kind>.failures[] surface as a stable contract.
  2. Route validation-failed writes through the aggregate — replace markCatalogValidationFailed direct upsert with repository.saveAll([extension.recordValidationFailed(...)]) so I-Repo-1 fires.
  3. Tombstoned visibility decision — either expose ReconcileResult.transitions[] in doctor extensions --json (richer doctor surface), or document Tombstoned as transient-at-persistence (lighter option, matches current reality).

Impact on UAT matrix

The swamp-uat extension test suite (EXTENSION_UAT_SUITE.md §9 RowState matrix) is being authored against current empirical reality, not the architectural ideal. When this issue ships, several test entries will become eligible for simplification:

  • ValidationFailed / BundleBuildFailed / EntryPointUnreadable dual-path test pairs collapse to single canonical-surface tests
  • Tombstoned absence-from-sourceDetails tests can become positive presence tests (if option 3a is taken)

Not blocking on this — matrix work proceeds against current implementation.

Environment

  • Discovered: 2026-05-02 during W6 (doctor extensions aggregate state) UAT integration testing
  • Affected: all swamp versions post-W3 (which introduced ReconcileFromDiskService alongside the legacy buildIndex path)
  • Surfaces: swamp doctor extensions --json --verbose, integration tests asserting on RowState
  • W1-W6 rearchitecture: see design/extension-rearchitecture.md
  • W3 reconcile service: src/libswamp/extensions/reconcile_from_disk_service.ts
  • Catalog DELETE behavior: src/infrastructure/persistence/extension_repository.ts:523-526
  • Aggregate-bypass write: src/domain/extensions/bundle_freshness.ts:398
02Bog Flow
OPENTRIAGEDIN PROGRESSCLOSED

Closed

5/12/2026, 3:55:27 PM

No activity in this phase yet.

03Sludge Pulse
Editable. Press Enter to edit.

stack72 commented 5/12/2026, 3:55:25 PM

▎ Closing in favor of #334, which captures the actionable subset of the architectural debt described here (invalidate-then-reconcile sequencing). The broader unification work (collapse registries.failures[] into sourceDetails[], route validation-failed ▎ writes through the aggregate, surface Tombstoned transitions) is real but premature to track as one ticket — better filed when a workstream is actually prioritized. See #334's "Related context" section for the deferred items.

Sign in to post a ripple.