Upgrade TUI graphics — better AI-generated ANSI or a Moebius hand-authored pipeline

A workflow-scope report's context.dataRepository.getContent(modelType, modelId, dataName, version) returns null for every data handle that an earlier step in the same workflow run wrote via context.writeResource(spec, key, body). The data is fully persisted: the catalog row exists for the exact (type_normalized, model_id, data_name, version) the report calls with, and the raw file is on disk. swamp data get <modelName> <dataName> returns the bytes immediately. Only the in-report dataRepository read fails.

findByName and findAllForModel exhibit the same behaviour, so all three documented read methods on the workflow-scope dataRepository are affected. The report's read pipeline cannot consume data its own workflow just produced.

Environment

swamp 20260521.221703.0-sha.3f2b75f8
macOS arm64
Local file datastore (default)

Reproduction

Build a model extension whose method writes multiple resources across two specs per call:

await context.writeResource("cluster", `cluster-${id}`, clusterBody);
await context.writeResource("instance", `instance-${id}--${memberId}`, instanceBody);

Build a workflow-scope report that iterates step.dataHandles and reads each handle:

for (const step of context.stepExecutions) {
  if (step.modelType !== "@me/my-model") continue;
  for (const handle of step.dataHandles ?? []) {
    const bytes = await context.dataRepository.getContent(
      step.modelType, step.modelId, handle.name, handle.version,
    );
    // bytes === null for every handle
  }
}

Define a workflow that runs the model and requires the report:
```
reports:
  require:
    - "@me/my-report"
```
Run: swamp workflow run my-workflow --input '{...}'.

Observe the report logs a "no bytes for X" warning for every handle, and inspect the catalog + disk to confirm the writes succeeded:

sqlite3 .swamp/data/_catalog.db \
  "SELECT type_normalized, model_id, data_name, version, is_latest
   FROM catalog WHERE model_id='<step.modelId>'"
# 44 rows, is_latest=1 on the latest version of each name

ls .swamp/data/@me/my-model/<modelId>/<dataName>/<version>/raw
# File exists, non-empty

swamp data get my-model-name <dataName>
# Returns the bytes successfully

Re-run the workflow without clearing anything — getContent still returns null for every handle on the second, third, ..., nth run. 5/5 consecutive runs from a clean state return 0 rows in the report.

Diagnosis

Side-channel logging inside the report confirms:

step.modelType matches the value in the catalog's type_normalized column exactly.
step.modelId matches the catalog's model_id exactly.
handle.name matches the catalog's data_name exactly (cluster-<id> / instance-<id>--<member> keys, including the spec-name prefix).
handle.version matches the catalog's version column exactly (number, not string).
handle.specName is correctly populated; the report's iteration sees both cluster and instance handles.

Despite all four arguments matching the catalog row, getContent returns null. findByName(modelType, modelId, dataName, version) returns null. findAllForModel(modelType, modelId) returns an empty array — even when the catalog has 44 rows for that exact (type_normalized, model_id).

The dataRepository instance passed to the report exposes repoDir, catalogStore, markDirty, baseDir as own properties. The read methods are on the prototype and don't throw — they just return null/empty.

Offline replay confirms the read pipeline is correct

A standalone replay that:

reads the persisted workflow-run-*.yaml,
extracts step.output.dataHandles as step.dataHandles,
and uses a custom dataRepository.getContent that reads ${baseDir}/${modelType}/${modelId}/${dataName}/${version}/raw via Deno.readFile,

renders the report's CSV with 100% correct row count and content. The report code is consuming the contract correctly; the runtime's dataRepository implementation is what returns null.

Expected behaviour

context.dataRepository.getContent(modelType, modelId, dataName, version) returns the bytes for any handle in step.dataHandles (for that step) whose catalog row exists. Same for findByName and findAllForModel.

Actual behaviour

getContent, findByName, findAllForModel all return null/empty regardless of whether the data was just written or has been persisted across previous runs. The catalog query and the runtime read of dataRepository are disconnected.

Impact

Workflow-scope reports that consume upstream models writing multiple resources per method call cannot read any of those resources. The misleading-but-superficial workaround is to shell out to jq over .swamp/data/<type>/<modelId>/<dataName>/latest/raw — which sidesteps the report system entirely but loses column ordering, dedup, and other in-report features. The user-facing report is effectively non-functional on the happy path, and the workflow shows succeeded, so the failure is silent unless the report itself logs a no-bytes warning.

Affected extensions (concrete repro)

@jentz/aws-rds-inventory — model writes one cluster resource + N instance resources per list_clusters call.
@jentz/aws-rds-inventory-csv — workflow-scope report that consumes the instance handles.

Both unpublished at the time of writing; loadable from local source paths via .swamp-sources.yaml. The report's smoke-test suite (which uses a synthetic dataRepository.getContent backed by an in-memory map) passes 37/37, so the bug is purely in the runtime's contract.

Extension code in PR 13

Workaround for users

jq over the on-disk JSON at .swamp/data/<modelType>/<modelId>/<handle.name>/latest/raw reproduces the report's row data without touching the report system.

02Bog Flow

Shipped

5/22/2026, 9:07:50 PM

Click a lifecycle step above to view its details.

03Sludge Pulse

stack72 assigned stack725/22/2026, 2:14:38 PM

stack72 commented 5/22/2026, 3:23:00 PM

Investigation Summary

I've done extensive code analysis and attempted reproduction of this bug. Here's what I found:

Reproduction attempt: I set up a scratch repo with a multi-spec model (writing cluster + instance resources) and a workflow-scope report reading via getContent(step.modelType, step.modelId, handle.name, handle.version). Tested: single-step, multi-job, direct type execution, source-path extensions via .swamp-sources.yaml, dynamic data names with double-dashes, repeated runs. All scenarios returned correct data — 0 null results.

Code analysis: Traced every path from step write through report read. The report's dataRepository is the exact same FileSystemUnifiedDataRepository instance (this.dataRepo) used by the execution service. Both it and the step executor's instance use the same baseDir (from this.dataBaseDir). Path construction is join(baseDir, type.toDirectoryPath(), modelId, dataName, version, "raw") — identical for both.

Diagnostic Questions

Since I cannot reproduce this, I need more information from the runtime state. Could you add the following diagnostic logging inside your report's execute function and share the output?

// 1. What is the dataRepository's baseDir?
const repo = context.dataRepository as any;
context.logger.info(`baseDir: ${repo.baseDir}`);
context.logger.info(`repoDir: ${repo.repoDir}`);

// 2. For the first failing handle, what path does it construct?
const step = context.stepExecutions[0];
const handle = step.dataHandles?.[0];
if (handle) {
  // Try calling getContentPath directly (it's on the prototype)
  try {
    const path = repo.getContentPath(step.modelType, step.modelId, handle.name, handle.version);
    context.logger.info(`contentPath: ${path}`);
    // Check if the file exists at that path
    try {
      const stat = await Deno.stat(path);
      context.logger.info(`file exists: ${stat.isFile}, size: ${stat.size}`);
    } catch (e) {
      context.logger.info(`stat failed: ${e.message}`);
    }
  } catch (e) {
    context.logger.info(`getContentPath failed: ${e.message}`);
  }
}

// 3. Does the model directory exist?
try {
  const modelDir = `${repo.baseDir}/${step.modelType}/${step.modelId}`;
  context.logger.info(`modelDir: ${modelDir}`);
  for await (const entry of Deno.readDir(modelDir)) {
    context.logger.info(`  entry: ${entry.name} (isDir: ${entry.isDirectory})`);
  }
} catch (e) {
  context.logger.info(`readDir failed: ${e.message}`);
}

Specific questions:

What does repo.baseDir show? Does it match the location where swamp data get resolves files?
Does getContentPath throw (since it takes ModelType not string), or does it return a path? If it returns a path, does Deno.stat find the file?
Does Deno.readDir on the model directory find entries, or does it throw NotFound?
Is there anything unusual about your .swamp.yaml (e.g. a datastore: section)?
Are there any symlinks in the path to your repo directory?

This will help narrow down whether the issue is a path mismatch, a baseDir mismatch, or something else entirely.

stack72 commented 5/22/2026, 3:23:54 PM

@jentz please let us know your thoughts on this

jentz commented 5/22/2026, 3:49:17 PM

Swam moves fast! I will upgrade swamp and do a repo upgrade and follow the guidance. Thanks for the tunraround velocity!

jentz commented 5/22/2026, 4:17:29 PM

Thanks for the deep code analysis and the diagnostic snippet, @stack72 — the questions are clear, and I'll wire them in if the sharper repro shape below doesn't get you there first.

A more specific repro condition

I documented this in the report's source + README on PR #13 (jentz/swamp-extensions) but didn't surface it on the original issue. The failure mode appears tied specifically to a workflow model_method task that auto-creates its target model on the first run that produces data — modelType + modelName where the named model does not yet exist when the workflow starts.

When the model is pre-created via swamp model create @vendor/type name --global-arg ... before the workflow runs, the report's getContent returns the bytes correctly. When the workflow auto-creates the model in the same run that produces the data, every getContent for that model's instance handles returns null — even though the bytes are confirmed on disk under .swamp/data/<type>/<modelId>/<handle>/<version>/raw and reachable via swamp data get.

Your scenario list doesn't explicitly call out auto-create-in-workflow. Could you try the following minimal shape against your scratch repo?

name: repro
jobs:
  - name: produce
    steps:
      - name: list
        task:
          type: model_method
          modelType: "@vendor/multi-spec-model"
          modelName: not-yet-existing    # ← key: must not exist before run
          methodName: list
reports:
  require:
    - "@vendor/multi-spec-report"        # workflow-scope, reads via getContent

Run once on a fresh .swamp/. If that reproduces, I'll wire your diagnostic snippet into our RDS report and capture output. If it still doesn't repro on 20260522.152735.0 (today's build, which I just upgraded to from 20260521.221703.0), I'll do the same against our real AWS workflow next time we exercise it.

What I can confirm now

.swamp.yaml has no datastore: section (default filesystem datastore).
No symlinks in the repo path (/Users/mark/code/jentz/swamp-extensions resolves to itself).
Bug documented at aws-rds-inventory-csv/aws_rds_inventory_csv.ts:627-649 and aws-rds-inventory-csv/README.md:121-153.

I'll hold on instrumenting until you've tried the sharper shape.

stack72 commented 5/22/2026, 6:17:01 PM

@jentz that gave me what I needed! On the fix now :)

jentz commented 5/22/2026, 7:21:47 PM

Sounds great :D

stack72 commented 5/22/2026, 9:02:34 PM

Thanks for the sharp reproduction condition, @jentz — it narrowed the search significantly.

We traced the bug to runWorkflowReports in the workflow execution service. After a step completes, the report context was getting its modelId from a post-execution name-based definition lookup (findByNameGlobal) instead of from the step execution itself. When that lookup returned a different definition than the one used to write data, every getContent call in the report got null — exactly what you described.

We couldn't reproduce the mismatch naturally (built-in types, local extensions, and source-mounted extensions all returned the correct definition in our tests), but fault-injecting a wrong modelId into the lookup produced your exact symptoms: 6 handles visible, 6 null reads, findAllForModel empty, workflow shows succeeded.

The fix in PR #1433 carries the modelId from step execution time through the model_resolved event, so reports always use the authoritative ID that was used to write data. The definition lookup is kept only for methodArgs/globalArgs.

Could you verify this against your @jentz/aws-rds-inventory + @jentz/aws-rds-inventory-csv workflow once a build with this change is available? That would confirm the fix resolves the issue in your specific environment.

stack72 commented 5/22/2026, 9:07:56 PM

Thanks @jentz for reporting this! The fix has been merged and a release is on its way. We appreciate your contribution to swamp.

Upgrade TUI graphics — better AI-generated ANSI or a Moebius hand-authored pipeline

Add assertVaultAnnotationExportConformance to @systeminit/swamp-testing

Add VaultAnnotationProvider conformance helpers to @systeminit/swamp-testing

Vault annotations: --note/--notes flag inconsistency and UX improvements

Docs: document VaultAnnotationProvider interface and extension opt-in pattern

Add VaultAnnotationProvider support to @swamp/1password

Add VaultAnnotationProvider support to @swamp/azure-kv

Add VaultAnnotationProvider support to @swamp/aws-sm

Harness detection invents env vars for kiro/opencode/codex

Annotating vault items should be a first-class swamp operation

workflow direct-execution inputs.* persist as globalArguments on auto-definitions and freeze on first run

workflow-scope report's dataRepository.getContent returns null for data written in the same workflow run

swamp-report skill references nonexistent `swamp model report` command

@swamp/digitalocean — add domain-records model for /v2/domains/{domain_name}/records

Docs: update extension scoring documentation for dependency-trust rubric factor

Warn when a ${{ }} secret expression is single-quoted in a command/shell run: script

Workflow validation should resolve modelType for direct-execution steps

dbcluster state schema is missing DBClusterMembers (writer/reader, instance class)

Add a list/discover method to dbcluster for enumerating clusters in a region

Add dependency-trust rubric factor to server-side scorer (RUBRIC_VERSION 3)

cloudidentity API calls fail with 'requires a quota project' — bundle doesn't send x-goog-user-project header

Improve idempotency match field heuristic for auto-generated name resources (tagKeys, tagValues)

@swamp/gcp/cloudresourcemanager/folders: create method has 5 blocking bugs (missing parent in body, LRO detection, post-LRO state, idempotency, projectId requirement)

Add IAM policy management (setIamPolicy/getIamPolicy) on cloudresourcemanager resources; add custom-role CRUD to @swamp/gcp/iam

@swamp/ssh exec method fails with 'ctx.createCelEnvironment is not a function'

Extension decision order should prefer @swamp/community extensions over local types

Make wheelshop-style dependency trust-gating a core swamp feature

swamp repo <unknown-subcommand> silently inits a nested repo (e.g. `swamp repo update`)

Extension METHODS table truncates Method column; short names like apply/check wrap mid-word on /extensions/@swamp/ssh

swamp extension rm leaves empty <kind>-bundles/<hash>/ dirs behind

Pre-flight checks cannot access method arguments (check context omits methodArgs/unresolvedMethodArgs)

swamp audit record --from-hook creates a stray .swamp datastore in the process cwd instead of resolving the repo root

extension push publishes model files ending in _test.ts that no consumer can load

Docs: document --extensions-dir / SWAMP_EXTENSIONS_DIR for worktree workflows

No user feedback when model method run is waiting for lock acquisition

issue-lifecycle: thank external contributors when issues are resolved

Add swamp extension prune to clean up stale catalog entries

identity_map row not updated when user renames

`swamp extension rm` leaves empty scaffold dirs behind

Many CLI commands acquire global .datastore.lock unnecessarily, causing 60s LockTimeoutError under any concurrent writer

swamp CLI commands fail silently or hang when invoked from git worktrees via SWAMP_REPO_DIR

Datastore: lazy hydration for fast cold-start on first clone

S3 datastore: dirty sidecar, partitioned index, content hashing, and scoped sync

Datastore sync: add SyncContext and SyncCapabilities framework contracts

Terminal rendering breaks at large font sizes

Expose cel-js Environment to extensions for custom CEL evaluation

Add list/search as a factory method that produces many data artifacts (Drive files.list, gmail messages.list, etc.)

files.get returns only minimal fields (id, name, kind, mimeType) because no 'fields' query parameter is sent

ADC path uses wrong gcloud token store: 'gcloud auth print-access-token' instead of 'gcloud auth application-default print-access-token'

Docs: update doctor reference and autoupdate how-to for new doctor install subcommand

createModelTestContext: storedResources not used by readResource; readResource always returns null

swamp-vault skill documents 'swamp vault read' but correct subcommand is 'read-secret'

Add a manual_approval (pause) task type to workflow steps

Autoupdate silently fails when swamp is installed system-wide via the official install.sh

Missing 'parent' field in GlobalArgsSchema for several @swamp/gcp/* models causes get to fail

bucket-policy GlobalArgsSchema requires Bucket and PolicyDocument, blocking workflow-YAML direct execution of get

Report execute throws are advisory: workflow marked succeeded, exit 0, AND report output is discarded

dataRepository.getContent rejects string type in production but docs and testing helper demonstrate strings

bucket-policy StateSchema.PolicyDocument declared z.string() but CloudControl returns it as a parsed object

Unified login input that detects email vs username by presence of '@'

Introduce `swampd`: long-running local daemon for shared cache, secrets, and extensions

workflow validate: false "Missing required inputs" when method args are set in the model definition

CEL and vault expressions not evaluated inside nested globalArguments fields

@swamp/digitalocean: 30 of 33 model types fail with version mismatch error

Add first-class Kilo Code tool support

Partitioned index for S3/GCS datastores (Phase 3)

Per-path dirty tracking in S3/GCS datastore extensions (Phase 2)

Docs: update doctor extensions JSON reference to include warnings[] field

Doctor kind-completed events should carry correct per-registry status

Surface type-extraction failures in doctor JSON output

Scoped sync and capability-gated concurrency for datastores (Phase 1)

Direct type execution fails for locally-defined extension types with pulled duplicates

Scaffold new extensions to publish-ready quality (12/12) by default

Add table width controls to swamp report get

Add a markdown output mode to `swamp report get`

Add a markdown output mode to swamp report get

swamp.club: 'Mark all read' link doesn't clear unread count on /inbox

Official @swamp/ssh extension supporting multiple SSH transport styles

W7 — unify extension failure surfaces; collapse registries.failures[] into sourceDetails[]

Surface Tombstoned transitions in doctor extensions output

Workflow-level runtime expressions (env., vault.) not resolved in driverConfig — docker driver receives literal ${{ ... }} strings