Skip to main content
← Back to list
01Issue
BugOpenSwamp CLI

context.readModelData returns different results depending on invocation context (manual vs workflow)

Opened by stack72 · 4/7/2026· GitHub #1113

Description

context.readModelData(modelName, specName) produces different results depending on whether the method is invoked manually (swamp model method run) or within a workflow. This makes manual runs unreliable for debugging workflow behavior.

  • Manual run: readModelData returns ALL historical data for the source model (no workflowRunId available → no scoping)
  • Workflow run: readModelData is scoped to data produced by the current workflow run (via workflowRunId tag filtering in raw_execution_driver.ts lines 138-140)

This means a method that works correctly in a workflow can produce wildly different (and incorrect) results when run manually for debugging.

Concrete Example

anime-source has 27 configured shows. search_configured produces 182 episodes per run.

# Manual run — returns 921 items (all historical data, including removed shows)
swamp model method run dedup filter --input sourceModel=anime-source
→ Read 921 episodes from "anime-source"
→ 304 "new" episodes (many are false positives from orphaned data)

# Workflow run — returns 182 items (current run only)
swamp workflow run discover-and-download
→ Read 182 episodes from "anime-source"
→ correct dedup results

The 921 items include data from shows that were removed from the config months ago (e.g., "Dark Gathering" removed from globalArgs, but its data persists with lifetime: infinite). This orphaned data is invisible in workflow runs but pollutes manual runs.

Why This Matters

  1. You can't debug workflows with manual runs. The primary way to test a model method is swamp model method run. If it returns different data than the workflow, you're debugging a different system.

  2. False confidence in fixes. A dedup fix that looks correct in manual testing may behave completely differently in the workflow (or vice versa). We spent significant time chasing dedup bugs that only manifested in one invocation context.

  3. No way to opt into scoping manually. There's no --scope-to-latest-run flag or equivalent. Manual runs always get the unscoped path.

Current Implementation

In raw_execution_driver.ts:

const workflowRunId = this.context.tagOverrides?.["workflowRunId"];
const readModelData = (modelName: string, specName?: string) =>
  dataAccessService.readModelData(modelName, specName, workflowRunId);

When workflowRunId is undefined (manual run), readModelData returns everything. When set (workflow run), it filters by workflowRunId tag.

Proposed Solution

readModelData should behave consistently regardless of invocation context. Options:

  1. Default to latest execution's output — when no workflowRunId is available, scope to the source model's most recent method output instead of returning all historical data
  2. Add a CLI flagswamp model method run ... --scope-to-latest to simulate workflow scoping during manual runs
  3. Always scope by default — return only the latest version of each unique data name, with an explicit opt-in for historical data

Any of these would make manual runs trustworthy for debugging.

Environment

  • swamp version: 20260206.200442.0
  • Extension: @keeb/mms/dedup calling readModelData("anime-source", "episode")
  • #1020 — closed as not-a-bug (findBySpec run-scoped, but same inconsistency exists)
  • #966 — forEach data.findBySpec resolves empty when data written by prior job
  • #914 — context.readModelData feature request

Automoved by swampadmin from GitHub issue #1113

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED

Open

4/7/2026, 11:27:29 PM

No activity in this phase yet.

03Sludge Pulse

Sign in to post a ripple.