Skip to main content
← Back to list
01Issue
BugClosedSwamp CLI
Assigneesstack72

Relationships

#429 Core repositories missing markDirty hook — scoped push only works on crash recovery

Opened by stack72 · 5/23/2026

Description

Three cache-writing repositories do not call the markDirty(relPath) hook before writing to the datastore cache. This means the sync service's per-path dirty tracking (introduced in the Phase 2 S3 extension overhaul, #379) has an incomplete view of what changed during a command. The extension must fall back to a full cache walk on every push instead of walking only dirty directories.

Only UnifiedDataRepository (data artifacts) has the hook wired. The following repositories write to the cache silently:

  • YamlOutputRepository — writes to outputs/
  • YamlWorkflowRunRepository — writes to workflow-runs/
  • YamlEvaluatedDefinitionRepository — writes to definitions-evaluated/

Impact

The Phase 2 S3 extension's scoped push optimization (47x faster walk phase — 142ms → 3ms on 1000-file repos) only activates during crash recovery (when dirty paths are loaded from a persisted sidecar). During normal operation, every push does a full walk because the in-memory dirty set is incomplete — the extension cannot trust it.

Scoped pull (3.9x faster via partitioned index) is unaffected — it uses context.models from the framework, not the dirty set.

Steps to Reproduce

  1. Configure an S3 datastore with the Phase 2 extension (dirty sidecar + scoped push)
  2. Run swamp model method run on a model that produces outputs
  3. Observe that pushChanged does a full walk instead of a scoped walk
  4. Enable SWAMP_S3_SYNC_TRACE=1 to see the walk phase timing — it walks all 15 subdirectories, not just the dirty paths

Proposed Fix

Wire the MarkDirtyHook through the three missing repositories, following the same pattern as UnifiedDataRepository:

  1. Each repository accepts a MarkDirtyHook parameter in its constructor
  2. Each repository calls the hook before every cache write (save, delete)
  3. repo_context.ts passes the same hook instance to all four repositories

The hook type (MarkDirtyHook) and the wiring pattern in repo_context.ts already exist — this is extending the existing pattern to cover all cache writers.

Environment

  • swamp version: current main
  • Affects: @swamp/s3-datastore Phase 2, @swamp/gcs-datastore, any future extension using per-path dirty tracking
02Bog Flow
OPENTRIAGEDIN PROGRESSCLOSED+ 1 MOREASSIGNEDCLASSIFICATION

Closed

5/23/2026, 11:47:45 PM

No activity in this phase yet.

03Sludge Pulse
stack72 assigned stack725/23/2026, 11:43:04 PM
Editable. Press Enter to edit.

stack72 commented 5/23/2026, 11:47:45 PM

Closing as already fixed. Commit da056083 ("feat(datastore): thread relPath through markDirty for per-path dirty tracking", swamp-club#232) implemented the exact fix described in this issue. All three repositories (YamlOutputRepository, YamlWorkflowRunRepository, YamlEvaluatedDefinitionRepository) already have MarkDirtyHook wired — constructors accept it, notifyDirty() helpers exist, and every write/delete call site invokes the hook. The factory in repository_factory.ts passes the same hook instance to all four repositories.

Sign in to post a ripple.