Skip to main content
← Back to list
01Issue
FeatureShippedSwamp CLI
Assigneesstack72

#290 Implement W5: Per-fingerprint import URLs + subprocess test harness (extension catalog rearchitecture)

Opened by stack72 · 5/7/2026· Shipped 5/11/2026

Problem

The audit identified the in-process Deno import cache as a structural cause of brittleness in the extension layer. When a long-running swamp process (workflow runner, eventually a daemon mode, the orchestrator) reloads an extension via await import(toFileUrl(path).href), Deno caches the module by URL. If the same URL is imported twice, the second import returns the cached module — even if the underlying bundle file has changed. The first import's module graph is sticky; subsequent reloads see stale code in memory while the bundle on disk is fresh.

The model loader has a workaround: an allExtensionMethodsAttached flag that re-attaches methods after each reload. The four sibling loaders don't have it. The workaround is silently load-bearing and undocumented.

Predecessors W1a-W4 (and the bug fixes around them — #265 / #273 / #274 / #284) close the catalog-side and bundle-cache-side correctness issues. W5 closes the runtime-import-side issue: bundles on disk are fresh, but V8's module graph keeps serving stale code.

Full architectural context: design/extension-rearchitecture.md ("W5 — Per-fingerprint import URLs + subprocess test harness" section) — referenced from #211.

Scope

Phase 1 — Per-fingerprint import URLs

Modify importBundleByPath (post-W4 lives once in the unified ExtensionLoader) to append a fingerprint query parameter to the bundle URL when importing:

import(toFileUrl(bundlePath).href + \"?fp=\" + sourceFingerprint)

The fingerprint is the catalog row's source_fingerprint (consistent with how freshness is determined elsewhere in the system). Different fingerprints produce different URLs, which Deno treats as distinct modules. If the bundle changes, the fingerprint changes, V8 imports the new module instead of returning the cached old one.

V8 retains both old and new modules in heap memory until garbage collected. For one-shot CLI invocations this doesn't matter (process exits, memory freed). For long-running processes, modules accumulate per fingerprint change. RAM growth is bounded in practice (each module is small) but worth measuring.

Phase 2 — Delete the allExtensionMethodsAttached workaround

The model loader's workaround in user_model_loader.ts becomes unnecessary once per-fingerprint URLs work correctly. Per-W4 the model loader is gone — the workaround lives in the unified ExtensionLoader if it was preserved during W4. Phase 2 deletes it and confirms the regression tests still pass.

Phase 3 — Subprocess test harness

In-process Deno can't exercise the cache-aliasing bug because the test process IS the process being tested. Module imports inside the test see the same V8 heap as production code. The fix's verification needs a subprocess test harness: spawn a Deno subprocess that imports, modify the bundle, ask the subprocess to import again, verify it sees the new content.

New src/infrastructure/testing/subprocess_test_harness.ts (or similar) provides:

  • spawnExtensionProcess(args): starts a subprocess that imports bundles via the unified loader
  • triggerReload(processHandle, source): signals the subprocess to re-import after the source is modified
  • assertModuleVersion(processHandle, expectation): verifies which version the subprocess sees

The harness becomes shared infrastructure for any future test that needs cross-process module-reload semantics.

Phase 4 — RAM growth benchmark

A new benchmark in src/libswamp/extensions/ (or alongside W3's reconcile_from_disk_bench.ts) measures heap delta after N module reloads in a subprocess. Pin a threshold (e.g., ≤ 50 MB after 100 reloads) and assert it. If the threshold is blown, surface for redesign — possible mitigations include WeakRef-tracked module versions, periodic subprocess restart, or an explicit eviction API.

Pre-work decisions to pin in the PR description

  1. Fingerprint source. Recommend: catalog row's source_fingerprint column (consistent with the rest of the freshness model). Alternative: hash of the bundle file content. Source fingerprint is path-stable across deno bundle runs that produce identical output; bundle fingerprint changes per build invocation. Source fingerprint is the right choice — it matches what findStaleFiles already computes.

  2. URL format. Query parameter (?fp=<hash>) vs fragment (#fp=<hash>). Recommend: query parameter. Deno treats distinct query strings as distinct modules; fragments may not be respected by Deno's module loader.

  3. RAM growth strategy. Recommend: accept indefinite retention for v1; document as known limitation. Don't add eviction logic speculatively. If post-merge measurement shows real RAM pressure for daemon use cases, file as a follow-up workstream.

  4. Subprocess harness location. Recommend: src/infrastructure/testing/subprocess_test_harness.ts. Shared across W5 + future module-reload tests.

  5. Cross-platform shape. Subprocess harness must work on Linux + macOS. Windows isn't a merge gate per W-series precedent. Use Deno's standard process spawning APIs; avoid platform-specific shell tricks.

Out of scope (deferred to later workstreams)

  • swamp doctor extensions aggregate-state rendering → W6
  • Bundle cache file eviction (orphaned bundles) → swamp-club#267 (post-W4)
  • sourceToRow mtime carrying through Source entity → swamp-club#271 (no urgency)
  • Daemon-mode subprocess isolation as an alternative to per-fingerprint URLs (out of scope; #267 territory if RAM growth turns out to be a problem)

Success criteria

  • Per-fingerprint import URLs in use across the unified loader. Bundle imports happen via import(toFileUrl(bundlePath).href + \"?fp=\" + sourceFingerprint).
  • allExtensionMethodsAttached workaround deleted. No remaining trace in the unified loader. Code grep returns zero occurrences in src/.
  • Subprocess test harness exists and verifies module reload behavior. Test: spawn subprocess → import bundle → modify bundle → trigger reload → assert subprocess sees new content. Parameterized over all 5 extension kinds.
  • RAM growth benchmark passes. ≤ 50 MB heap delta after 100 module reloads in a subprocess (or the threshold the agent pre-commits to).
  • Long-running workflow scenario verified. Run a multi-step workflow that reloads extensions across steps; confirm the workflow runner sees fresh code on each reload, not the first-import-cached code.
  • All existing tests pass on Linux + macOS (Windows not a merge gate per W-series precedent).
  • Auto-ship-on-merge readiness verified via diversity-matrix soak.

Suggested test additions

  • Subprocess module-reload test (parameterized over 5 kinds): import → modify → re-import → assert new content. The load-bearing structural verification.
  • Fingerprint-collision test: two bundles with the same fingerprint produce the same URL and share V8's cached module (the fast path; this is correct behavior).
  • Fingerprint-divergence test: same bundle path, different fingerprints (sequential reloads) → V8 sees distinct modules → loader uses the latest.
  • allExtensionMethodsAttached regression: scenario that previously needed the workaround. Without the workaround AND with per-fingerprint URLs, the scenario passes. (This is the test that proves the workaround is genuinely no longer needed.)
  • RAM growth benchmark: 100 reloads in a subprocess; assert heap delta within threshold.
  • Workflow runner integration: a workflow with multiple steps each reloading the same extension after edits; assert each step sees its expected version.

Auto-ship-on-merge constraint

Same gates as W2/W3/W4:

  • CI green (all new + existing tests + type-check + lint + fmt)
  • Author smoke on real repo: workflow runs across multiple reloads see fresh code each time
  • Reviewer smoke on different real repo
  • Diversity-matrix soak (multiple machines × OS × extension shape × workflow patterns)
    • Specifically watch for: long-running process scenarios; workflow runner reloads; any module-cache-aliasing behavior
  • Subprocess test harness verified on Linux + macOS
  • RAM growth benchmark passes threshold
  • No allExtensionMethodsAttached references remaining in src/
  • Forward-only revert posture documented

Push-back encouraged

If the design doesn't fit the ground, surface before implementation. Specific watch list:

  • Deno's module loader behavior with query parameters. Verify empirically that import(\"...js?fp=abc\") and import(\"...js?fp=def\") produce distinct V8 modules. If Deno strips query parameters or normalizes them, the fingerprint approach breaks; surface and pivot to an alternative (e.g., temporary symlinks, content-addressed bundle paths).
  • allExtensionMethodsAttached may be doing more than the documented workaround suggests. If deletion breaks tests in non-obvious ways, the workaround is load-bearing for something else; surface and investigate before forcing the deletion.
  • Subprocess harness complexity. If spawning + signal handling + cross-platform behavior turns out to be substantially more code than expected, surface — it might warrant its own scoped PR before W5's behavior change.
  • RAM growth blown. If the benchmark exceeds threshold by a lot, surface for redesign discussion before merging. Mitigation options: WeakRef-tracked modules, periodic subprocess restart, fingerprint-bucketing (group nearby fingerprints into the same URL), or explicit eviction API.

The two most expensive misses to watch for

  1. Deno strips or normalizes query parameters in import URLs. If this is true, per-fingerprint URLs don't work and the entire approach is wrong. Catch by: empirical test EARLY in implementation. A 5-line script that imports two URLs with different query parameters and confirms they're distinct modules. If they're the same module, surface immediately.

  2. allExtensionMethodsAttached is load-bearing for something the audit didn't catch. If its deletion silently breaks production behavior in a way tests don't catch, soak might miss it. Mitigation: explicit regression test for the original failure mode the workaround was added for; if no such failure mode is documented, treat as unsafe and investigate before deletion.

References

  • Predecessors: #211 (W1 tracking), #223 (W1b), #231 (W2), #252 (W3), #269 (W4)
  • Closes: the allExtensionMethodsAttached workaround in the model loader (preserved through W4 if it survived loader unification)
  • Trackers it doesn't address: #267 (extension layer GC), #271 (sourceToRow mtime)
  • Design doc: design/extension-rearchitecture.md
02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 1 MOREASSIGNED+ 6 MOREREVIEW+ 3 MOREPR_MERGEDSHIPPED

Shipped

5/11/2026, 3:56:20 PM

Click a lifecycle step above to view its details.

03Sludge Pulse
stack72 assigned stack725/11/2026, 2:22:34 PM

Sign in to post a ripple.