Upgrade TUI graphics — better AI-generated ANSI or a Moebius hand-authored pipeline

Path	Scope it enforces	How
`context.readModelData(name, spec?)`	Current workflow run	`DataAccessService` filters on `ownerDefinition.workflowRunId`
`data.findBySpec(name, spec)`	Current workflow run	`ModelResolver` delegate reads `context.workflowRunId` and filters server-side
`context.queryData(pred)` / `data.query(pred)`	Current workflow run (in driver) / nothing (in CEL)	Driver string-concatenates `&& tags.workflowRunId == "${id}"` into the predicate; CEL delegate does nothing

This is the root of the 15+ scoping bugs filed against data access (#914, #966, #987, #1020, #1023, #1058, #1066, #1105, #1113, #497, etc.). Every fix has been a 1-off because the underlying problem isn't any particular filter — it's that scoping is hidden inside the framework. A predicate that looks like it should return all matching data quietly returns a workflow-scoped subset, because some delegate or driver wrapper added a clause the author can't see. Different callers see different results from the same call.

The fix is not to add more scoping options, a scope parameter, or a magic ctx variable bound into the predicate environment. The fix is to remove hidden scoping entirely and make the data shape rich enough that any filter the author wants is expressible — and visible — in the predicate string itself.

Design principles

Extensions see everything by default. No filtering happens unless the author wrote the clause.
The method signature is the contract. Reading data.query('modelName == \"dedup\"') should tell you the entire scope. There is no other filter being applied behind the scenes.
Provenance is data, not metadata. Workflow run, job, step, source — these are properties of a data record, not contextual scoping rules. They belong as first-class queryable fields.
Filtering is the author's job. The framework gives you queryable fields. You write the predicate. If you want "data from this workflow run", you write workflowRunId == \"...\". If you want "everything ever produced by this model", you write modelName == \"...\". If you want "data from the dedup step of this run", you write step == \"dedup\" && workflowRunId == \"...\".

Proposed solution

1. Promote provenance fields out of `tags` / `ownerDefinition` into first-class `DataRecord` fields

Today the data writer (data_writer.ts:509-527) merges these into data.tags:

specName (auto-injected)
modelName (auto-injected, originally for orphan recovery #370)
type (auto-injected: "resource" or "file")
Workflow tag overrides from execution_service.ts:440-446: source, workflow, workflowRunId, step (and presumably job)

And ownerDefinition separately carries ownerType, ownerRef, and (redundantly) workflowRunId.

tags is supposed to be the user-defined tag namespace. Today it's a junk drawer where the framework smuggles ownership/provenance metadata so that predicates can reach it (since that's the only field-of-maps available in QUERY_FIELDS). This is the wrong place for it.

Promote everything to first-class DataRecord fields:

interface DataRecord {
  // existing
  id: string;
  name: string;
  version: number;
  createdAt: string;
  attributes: Record<string, unknown>;
  modelName: string;       // already first-class
  modelType: string;
  specName: string;        // already first-class
  dataType: string;
  contentType: string;
  lifetime: string;
  ownerType: string;
  streaming: boolean;
  size: number;
  content: string;

  // promoted from ownerDefinition / framework-managed tags
  ownerRef: string;        // from ownerDefinition.ownerRef
  workflowRunId: string;   // \"\" when not produced inside a workflow
  workflowName: string;    // \"\" when not produced inside a workflow
  jobName: string;         // \"\" when not produced inside a workflow step
  stepName: string;        // \"\" when not produced inside a workflow step
  source: string;          // e.g. \"step-output\", \"manual\", etc.

  // user-defined only
  tags: Record<string, string>;
}

Add corresponding columns to the catalog schema with appropriate indexes (workflow_run_id, step_name, etc.). Add the new field names to QUERY_FIELDS in query_predicate.ts.

The data writer stops smuggling these into tags. tags becomes purely user-defined, restoring its intended meaning.

2. Remove all hidden scoping logic

Once provenance fields are first-class, the code paths that hide scoping behind delegates and wrappers are unnecessary and should be removed:

raw_execution_driver.ts:142-149 — delete the queryData wrapper. Pass dataQueryService.query through directly. If an extension wants workflow scoping it writes && workflowRunId == \"...\" itself.

// before
const queryData = this.context.queryData && workflowRunId
  ? (predicate, select?) => {
      const scopedPredicate = `(${predicate}) && tags.workflowRunId == \"${workflowRunId}\"`;
      return this.context.queryData!(scopedPredicate, select);
    }
  : this.context.queryData;

// after
const queryData = this.context.queryData;

model_resolver.ts:632-685 — findBySpec becomes a thin pass-through to data.query (or is deleted entirely once callers migrate). Its runId filter (lines 657-661) is removed. data.query is already a pass-through; nothing to change there.

DataAccessService.readModelData — the workflowRunId filter (lines 149-154) is removed. Since this whole method is going to be retired in favor of queryData, the cleanup happens during caller migration.

3. Predicate examples after the change

What you want	Predicate
Every episode the dedup model has ever produced	`modelName == \"dedup\" && specName == \"episode\"`
Episodes the dedup model produced in workflow run X	`modelName == \"dedup\" && specName == \"episode\" && workflowRunId == \"X\"`
Episodes from the dedup step in run X	`step == \"dedup\" && workflowRunId == \"X\"`
Everything produced by manual `swamp model method run` invocations (no workflow)	`workflowRunId == \"\"`
Everything ever produced by a model, regardless of source	`modelName == \"X\"`
Episodes where a user-defined tag says it's a re-encode	`modelName == \"dedup\" && tags.encode == \"reencode\"`

Each predicate is self-contained. Reading the line tells you exactly what data will come back. There is no driver, delegate, or context wrapper changing the answer behind the author's back.

What this enables

The two callers that drove this refactor become trivial:

mms_dedup.ts:194 (extension method):

// before
const items = await context.readModelData(args.sourceModel, \"episode\");

// after
const items = await context.queryData(
  `modelName == \"${args.sourceModel}\" && specName == \"episode\"`
);

The author decides whether to also filter by workflowRunId, step, etc. Nothing happens implicitly.

workflows/discover-and-download.yaml and eztv-check.yaml (workflow forEach):

# before
in: ${{ data.findBySpec(\"dedup\", \"episode\") }}

# after — author writes the scoping they actually want
in: ${{ data.query('modelName == \"dedup\" && specName == \"episode\" && workflowRunId == \"' + workflow.runId + '\"') }}

The workflow YAML author has to know about workflow.runId and concatenate it themselves. That's fine — it's explicit. (If workflow.runId isn't already exposed in the workflow expression context, that's a small separate change to make it available as a regular variable, not as a magic predicate binding.)

Why not a `scope` option, or a `ctx` predicate variable, or auto-injection

All of these were considered and rejected:

scope: { workflowRunId } option on DataQueryOptions: same hidden behavior, same magic, just relocated from the driver wrapper into the service. Users still can't see the filter from the call site. Future requirements (parent run scoping, time windows, etc.) require new options matrix entries.
Magic ctx variable bound into the predicate environment: better than auto-injection but still hidden — the predicate becomes context-dependent in a way that's not obvious from reading the call. Same predicate string returns different results in different contexts. Same problem.
Auto-injecting clauses at any layer: this is what findBySpec and the driver wrapper do today. It is the source of the bug class.

The point of this refactor is that the predicate string IS the contract. If it doesn't say it, it isn't happening.

Out of scope for this issue

Migration of existing extensions / workflow YAML to the new API (separate work, dependent on this)
Deprecation/removal of readModelData, findBySpec, DataAccessService (separate work, dependent on this)
Vault reference resolution in query results (not needed for current callers)
Orphan data recovery (not needed for current callers; #370 was the original justification but modelName being a first-class field would handle the same use case explicitly)
Exposing additional workflow expression variables (workflow.runId, etc.) — small follow-up if not already present

Automoved by swampadmin from GitHub issue #1123

02Bog Flow

Shipped

4/9/2026, 1:32:51 PM

No activity in this phase yet.

03Sludge Pulse

stack72 assigned adam4/9/2026, 12:00:01 PM

Upgrade TUI graphics — better AI-generated ANSI or a Moebius hand-authored pipeline

Add assertVaultAnnotationExportConformance to @systeminit/swamp-testing

Add VaultAnnotationProvider conformance helpers to @systeminit/swamp-testing

Vault annotations: --note/--notes flag inconsistency and UX improvements

Docs: document VaultAnnotationProvider interface and extension opt-in pattern

Add VaultAnnotationProvider support to @swamp/1password

Add VaultAnnotationProvider support to @swamp/azure-kv

Add VaultAnnotationProvider support to @swamp/aws-sm

Harness detection invents env vars for kiro/opencode/codex

Annotating vault items should be a first-class swamp operation

workflow direct-execution inputs.* persist as globalArguments on auto-definitions and freeze on first run

workflow-scope report's dataRepository.getContent returns null for data written in the same workflow run

swamp-report skill references nonexistent `swamp model report` command

@swamp/digitalocean — add domain-records model for /v2/domains/{domain_name}/records

Docs: update extension scoring documentation for dependency-trust rubric factor

Warn when a ${{ }} secret expression is single-quoted in a command/shell run: script

Workflow validation should resolve modelType for direct-execution steps

dbcluster state schema is missing DBClusterMembers (writer/reader, instance class)

Add a list/discover method to dbcluster for enumerating clusters in a region

Add dependency-trust rubric factor to server-side scorer (RUBRIC_VERSION 3)

cloudidentity API calls fail with 'requires a quota project' — bundle doesn't send x-goog-user-project header

Improve idempotency match field heuristic for auto-generated name resources (tagKeys, tagValues)

@swamp/gcp/cloudresourcemanager/folders: create method has 5 blocking bugs (missing parent in body, LRO detection, post-LRO state, idempotency, projectId requirement)

Add IAM policy management (setIamPolicy/getIamPolicy) on cloudresourcemanager resources; add custom-role CRUD to @swamp/gcp/iam

@swamp/ssh exec method fails with 'ctx.createCelEnvironment is not a function'

Extension decision order should prefer @swamp/community extensions over local types

Make wheelshop-style dependency trust-gating a core swamp feature

swamp repo <unknown-subcommand> silently inits a nested repo (e.g. `swamp repo update`)

Extension METHODS table truncates Method column; short names like apply/check wrap mid-word on /extensions/@swamp/ssh

swamp extension rm leaves empty <kind>-bundles/<hash>/ dirs behind

Pre-flight checks cannot access method arguments (check context omits methodArgs/unresolvedMethodArgs)

swamp audit record --from-hook creates a stray .swamp datastore in the process cwd instead of resolving the repo root

extension push publishes model files ending in _test.ts that no consumer can load

Docs: document --extensions-dir / SWAMP_EXTENSIONS_DIR for worktree workflows

No user feedback when model method run is waiting for lock acquisition

issue-lifecycle: thank external contributors when issues are resolved

Add swamp extension prune to clean up stale catalog entries

identity_map row not updated when user renames

`swamp extension rm` leaves empty scaffold dirs behind

Many CLI commands acquire global .datastore.lock unnecessarily, causing 60s LockTimeoutError under any concurrent writer

swamp CLI commands fail silently or hang when invoked from git worktrees via SWAMP_REPO_DIR

Datastore: lazy hydration for fast cold-start on first clone

S3 datastore: dirty sidecar, partitioned index, content hashing, and scoped sync

Datastore sync: add SyncContext and SyncCapabilities framework contracts

Terminal rendering breaks at large font sizes

Expose cel-js Environment to extensions for custom CEL evaluation

Add list/search as a factory method that produces many data artifacts (Drive files.list, gmail messages.list, etc.)

files.get returns only minimal fields (id, name, kind, mimeType) because no 'fields' query parameter is sent

ADC path uses wrong gcloud token store: 'gcloud auth print-access-token' instead of 'gcloud auth application-default print-access-token'

Docs: update doctor reference and autoupdate how-to for new doctor install subcommand

createModelTestContext: storedResources not used by readResource; readResource always returns null

swamp-vault skill documents 'swamp vault read' but correct subcommand is 'read-secret'

Add a manual_approval (pause) task type to workflow steps

Autoupdate silently fails when swamp is installed system-wide via the official install.sh

Missing 'parent' field in GlobalArgsSchema for several @swamp/gcp/* models causes get to fail

bucket-policy GlobalArgsSchema requires Bucket and PolicyDocument, blocking workflow-YAML direct execution of get

Report execute throws are advisory: workflow marked succeeded, exit 0, AND report output is discarded

dataRepository.getContent rejects string type in production but docs and testing helper demonstrate strings

bucket-policy StateSchema.PolicyDocument declared z.string() but CloudControl returns it as a parsed object

Unified login input that detects email vs username by presence of '@'

Introduce `swampd`: long-running local daemon for shared cache, secrets, and extensions

workflow validate: false "Missing required inputs" when method args are set in the model definition

CEL and vault expressions not evaluated inside nested globalArguments fields

@swamp/digitalocean: 30 of 33 model types fail with version mismatch error

Add first-class Kilo Code tool support

Partitioned index for S3/GCS datastores (Phase 3)

Per-path dirty tracking in S3/GCS datastore extensions (Phase 2)

Docs: update doctor extensions JSON reference to include warnings[] field

Doctor kind-completed events should carry correct per-registry status

Surface type-extraction failures in doctor JSON output

Scoped sync and capability-gated concurrency for datastores (Phase 1)

Direct type execution fails for locally-defined extension types with pulled duplicates

Scaffold new extensions to publish-ready quality (12/12) by default

Add table width controls to swamp report get

Add a markdown output mode to `swamp report get`

Add a markdown output mode to swamp report get

swamp.club: 'Mark all read' link doesn't clear unread count on /inbox

Official @swamp/ssh extension supporting multiple SSH transport styles

W7 — unify extension failure surfaces; collapse registries.failures[] into sourceDetails[]

Surface Tombstoned transitions in doctor extensions output

Workflow-level runtime expressions (env., vault.) not resolved in driverConfig — docker driver receives literal ${{ ... }} strings