Skip to main content

DATA OUTPUTS

Data outputs are the versioned artifacts produced when model methods execute. Each method execution can write structured data (resources) and unstructured content (files). Data outputs are stored in .swamp/data/ within the repository, organized by model type, model ID, and data name.

Output Types

Model types declare their output specifications using two categories: resources and files. Each declared spec has a name (the spec name) that identifies it within the model type.

Resource Outputs

Structured JSON data validated against a schema.

Field Type Required Default Description
description string No None Human-readable description
schema Zod schema Yes Validates data on write
lifetime Lifetime Yes Retention policy
garbageCollection GarbageCollectionPolicy Yes Version retention policy
tags Record<string, string> No {} Default tags (auto-includes type: "resource")
sensitiveOutput boolean No false Treat all fields as sensitive
vaultName string No First available vault Vault for storing sensitive field values

Resource content is always stored as application/json.

File Outputs

Binary or text content identified by MIME type.

Field Type Required Default Description
description string No None Human-readable description
contentType string Yes MIME type (e.g., text/plain)
lifetime Lifetime Yes Retention policy
garbageCollection GarbageCollectionPolicy Yes Version retention policy
streaming boolean No false Line-oriented append mode
tags Record<string, string> No {} Default tags (auto-includes type: "file")

Example: command/shell Model Type

The built-in command/shell model type declares one resource and one file output:

resources:
  result        [resource]  — Shell command execution result (infinite)
  
files:
  log           [file]      — Shell command output, text/plain (infinite, streaming)

After execution, both are visible via swamp data list:

Data for hello-world (command/shell)

file (1 item):
  log  v2  text/plain  19B  2026-04-07

resource (1 item):
  result  v1  application/json  135B  2026-04-07

report (2 items):
  report-swamp-method-summary  v2  text/markdown  482B  2026-04-07
  report-swamp-method-summary-json  v2  application/json  2.6KB  2026-04-07

Lifetime

Lifetime determines how long data is retained before it becomes eligible for garbage collection.

Value Description
Duration string 1h, 5m, 10d, 2w, 1mo, 10y
ephemeral Deleted when the process ends
infinite Never automatically deleted
job Lives until the job completes
workflow Lives until the workflow completes

Duration format: {number}{unit} where unit is h (hours), m (minutes), d (days), w (weeks), mo (months), or y (years).

Zero-duration strings (e.g., 0h, 0d) are normalized to workflow.

Duration Conversion

Unit Conversion
m value × 60,000 ms
h value × 3,600,000 ms
d value × 86,400,000 ms
w value × 604,800,000 ms
mo value × 2,592,000,000 ms (30 days)
y value × 31,536,000,000 ms (365 days)

Expiration Rules

  • infinite: Never expires.
  • ephemeral: Not yet implemented — treated as non-expiring.
  • job / workflow: Expires when the associated workflow run no longer exists. Requires workflowId and workflowRunId in the owner definition. If either is missing, the data is not expired.
  • Duration strings: Expires when createdAt + duration is in the past.

Garbage Collection Policy

Garbage collection controls how many versions of a data item are retained.

Value Description
integer Keep the N most recent versions
Duration string Keep versions created within the duration

The integer must be a positive integer. The duration string uses the same format as lifetime durations and must be greater than zero.

# Keep the 10 most recent versions
garbageCollection: 10

# Keep versions from the last 7 days
garbageCollection: 7d

Garbage collection runs as part of swamp data gc and during the lifecycle service. It operates in two phases:

  1. Expired data deletion — removes all versions of data items whose lifetime has elapsed.
  2. Version pruning — for non-expired data, removes old versions that exceed the garbage collection policy.

Versioning

Each data item is versioned with sequential positive integers starting at 1. Every method execution that writes to the same data name produces a new version.

$ swamp data versions hello-world result --json
{
  "dataName": "result",
  "modelName": "hello-world",
  "modelType": "command/shell",
  "versions": [
    {
      "version": 2,
      "createdAt": "2026-04-07T18:03:08.737Z",
      "size": 135,
      "checksum": "d58d16...",
      "isLatest": true
    },
    {
      "version": 1,
      "createdAt": "2026-04-07T18:02:58.146Z",
      "size": 157,
      "checksum": "c631b6...",
      "isLatest": false
    }
  ],
  "total": 2
}

The "latest" Pointer

Each data item has a latest file in its directory containing the current version number as plain text. When data is retrieved without an explicit --version flag, the latest version is returned.

.swamp/data/command/shell/{model-id}/result/
  1/
  2/
  latest          # Contains: "2"

The name latest is reserved — it cannot be used as a data name.

Checksums

Each version includes a SHA-256 checksum computed from the content file at finalization time. The checksum is stored in metadata.yaml and returned by data access commands.


Storage Layout

Data is stored on disk under .swamp/data/ in a hierarchical directory structure:

.swamp/data/
  {model-type-path}/
    {model-id}/
      {data-name}/
        1/
          metadata.yaml
          raw
        2/
          metadata.yaml
          raw
        latest
  • {model-type-path} — the model type as a directory path (e.g., command/shell).
  • {model-id} — the UUID of the model definition.
  • {data-name} — the instance name given when data is written.
  • {version}/ — a numbered directory for each version.
  • metadata.yaml — full metadata for the version.
  • raw — the content (JSON for resources, binary or text for files).
  • latest — text file containing the current version number.

Metadata File

Each version's metadata.yaml contains the complete data record:

id: e204ea55-3d64-48a0-aa78-32fea656fdac
name: result
version: 1
contentType: application/json
lifetime: infinite
garbageCollection: 10
streaming: false
tags:
  type: resource
  specName: result
  modelName: hello-world
ownerDefinition:
  ownerType: model-method
  ownerRef: 7347cf2c-cc9e-4203-8897-e10845af9732
createdAt: "2026-04-07T18:02:58.146Z"
size: 157
checksum: c631b676cd069af1decf4f20c27568f44bcccf062846bb32bbeae187573c2fe6

Tags

Tags are key-value string pairs attached to data. They are used for filtering, discovery, and categorization.

Tag Resolution Chain

Tags are resolved in order, with later steps overriding earlier ones:

  1. Type auto-tagtype: "resource" or type: "file" (always present).
  2. Definition tags — tags from the model definition.
  3. Spec defaults — tags declared on the output specification.
  4. Method write overrides — tags passed by the method when writing.
  5. specName auto-tag — the output spec name (always injected).
  6. modelName auto-tag — the definition name (always injected).
  7. Workflow tag overrides — tags from workflow step context.
  8. Runtime tags — tags provided via --tag CLI flags.
  9. Data output overrides — tags from workflow dataOutputOverrides.

The type tag is required on all data. Data without a type tag fails validation.

Common Tag Values

Tag Key Auto-Injected Description
type Yes resource, file, or report
specName Yes Output spec name from the model type
modelName Yes Definition name for orphan data recovery

Streaming

When streaming: true is set on a file output spec, the data writer operates in line-oriented append mode. Lines are written incrementally to disk as they are produced, rather than buffered in memory.

The command/shell model type declares streaming: true on its log file output to capture stdout and stderr as the command executes.

Streaming file writers support three write patterns:

  • writeLine(line) — appends a single line with a newline character.
  • writeStream(stream) — pipes a ReadableStream to disk, invoking optional line callbacks as newlines are encountered.
  • getFilePath() — returns the allocated content path for direct file writes.

Non-streaming outputs use writeAll(content) or writeText(text) to write the complete content at once.


Sensitive Output

Resource output specs can mark fields as sensitive. Sensitive values are stored in a vault and replaced with vault.get() reference expressions before the data is persisted to disk.

Field-Level Sensitivity

Individual fields are marked sensitive through Zod schema metadata:

schema: z.object({
  apiKey: z.string().meta({ sensitive: true }),
  publicId: z.string(),
});

Only apiKey is stored in the vault. publicId is persisted as-is.

Whole-Output Sensitivity

When sensitiveOutput: true is set on the resource spec, all top-level fields are treated as sensitive.

Vault Resolution Order

The vault used for storing sensitive fields is resolved in this order:

  1. Field-level vaultName from schema metadata
  2. Spec-level vaultName from the resource output specification
  3. First available vault from the vault service

If sensitive fields exist but no vault is configured, an error is thrown.

Vault Key Format

Auto-generated vault keys follow this pattern:

{sanitized-model-type}-{model-id}-{method-name}-{field-path}

Sanitization: @ and null bytes are removed, / and \ are replaced with -, .. is replaced with ..

Persisted Format

After processing, persisted resource data contains vault references:

apiKey: "${{ vault.get('my-secrets', 'command-shell-abc-execute-apiKey') }}"
publicId: "pk_12345"

On read, vault references are automatically resolved back to their original values. Resolved secrets are registered with the secret redactor to prevent log leakage.

Non-string sensitive values are JSON-stringified before vault storage.


Lifecycle States

Each data entry has a lifecycle state.

State Description
active Normal, live data (default)
deleted Tombstone marker — the data has been deleted or renamed

Deletion Markers

A deletion marker is a version with lifecycle: "deleted", application/json content type, and streaming: false. It signals that the data was intentionally removed.

Rename Markers

A rename marker is a deletion marker with an additional renamedTo field pointing to the new data name. When the latest version of a data item is a rename marker, lookups without an explicit version follow the forward reference to the new name (up to 5 levels deep).

$ swamp data rename hello-world result execution-result --json
{
  "oldName": "result",
  "newName": "execution-result",
  "modelId": "7347cf2c-cc9e-4203-8897-e10845af9732",
  "modelName": "hello-world",
  "modelType": "command/shell",
  "copiedVersion": 2,
  "newVersion": 1,
  "warning": "Any workflows or models that produce data under \"result\" will overwrite the forward reference. Update them to use \"execution-result\" instead."
}

The rename process:

  1. Copies the latest version of the old data name to version 1 under the new name.
  2. Writes a tombstone with a forward reference on the old name.
  3. Updates the latest marker on the old name to point to the tombstone.

Owner Definition

Every data entry tracks its owner — who created it.

Field Type Required Description
ownerType string Yes model-method, workflow-step, or manual
ownerRef string Yes Identifier of the owner (model ID, step ref)
workflowId string No Workflow UUID (for job/workflow lifetimes)
workflowRunId string No Workflow run UUID

Ownership is validated on write — new versions of an existing data name must have the same ownerType and ownerRef as the original.


Data Output Overrides

Workflow steps can override the default output spec settings for data produced by their tasks. See dataOutputOverrides in the workflows reference.

Field Type Description
specName string Output spec name to override
lifetime Lifetime Override retention policy
garbageCollection GarbageCollectionPolicy Override version retention
tags Record<string, string> Additional tags merged with output tags
vary string[] Input key names to vary by (composite data names)

When vary is set, the resolved values of the named input keys are appended as a suffix to the data instance name. This produces distinct data items per iteration in forEach steps.


CEL Access

Data outputs are accessible in CEL expressions through the data namespace:

Function Description
data.latest(modelName, dataName) Latest version of a data item
data.latest(modelName, dataName, varyValues[]) Latest version with vary suffix
data.version(modelName, dataName, version) Specific version
data.version(modelName, dataName, varyValues[], version) Specific version with vary suffix
data.listVersions(modelName, dataName) All version numbers
data.listVersions(modelName, dataName, varyValues[]) All version numbers with vary suffix
data.findByTag(tagKey, tagValue) Search by tag
data.findBySpec(modelName, specName) Find by output spec name
data.query(predicate, select?) CEL predicate query

These functions return DataRecord objects with the following fields:

Field Type Description
id string Data UUID
name string Data instance name
version number Version number
createdAt string ISO 8601 timestamp
attributes Record<string, unknown> Parsed JSON content (resources only)
tags Record<string, string> All tags
modelName string Model definition name
modelType string Model type path
specName string Output spec name
dataType string resource or file
contentType string MIME type
lifetime string Lifetime policy
ownerType string Owner type
streaming boolean Whether streaming is enabled
size number Content size in bytes
content string Raw content string

CLI Commands

All data commands accept --json to output structured JSON instead of human-readable text.

swamp data get <model> <data_name>

Retrieve data by model name and data name. Returns the latest version by default.

Option Description
--version Retrieve a specific version number
--workflow Get data produced by a workflow
--run Specific workflow run ID
--no-content Show metadata only, without content
--repo-dir Repository directory (default .)
$ swamp data get hello-world result --json
{
  "id": "e204ea55-3d64-48a0-aa78-32fea656fdac",
  "name": "result",
  "modelName": "hello-world",
  "modelType": "command/shell",
  "version": 1,
  "contentType": "application/json",
  "lifetime": "infinite",
  "garbageCollection": 10,
  "streaming": false,
  "tags": {
    "type": "resource",
    "specName": "result",
    "modelName": "hello-world"
  },
  "ownerDefinition": {
    "ownerType": "model-method",
    "ownerRef": "7347cf2c-cc9e-4203-8897-e10845af9732"
  },
  "createdAt": "2026-04-07T18:02:58.146Z",
  "size": 157,
  "checksum": "c631b676cd...",
  "contentPath": ".swamp/data/command/shell/.../result/1/raw",
  "content": {
    "exitCode": 0,
    "executedAt": "2026-04-07T18:02:58.143Z",
    "command": "echo \"Hello from the swamp!\"",
    "durationMs": 4,
    "stdout": "Hello from the swamp!",
    "stderr": ""
  }
}

swamp data list [model]

List all data for a model, grouped by type.

Option Description
--type Filter by data type (resource, file, report)
--workflow List data produced by a workflow
--run Specific workflow run ID
--repo-dir Repository directory (default .)
$ swamp data list hello-world
Data for hello-world (command/shell)

file (1 item):
  log  v2  text/plain  19B  2026-04-07

resource (1 item):
  result  v1  application/json  135B  2026-04-07

report (2 items):
  report-swamp-method-summary  v2  text/markdown  482B  2026-04-07
  report-swamp-method-summary-json  v2  application/json  2.6KB  2026-04-07

swamp data versions <model> <data_name>

Show all versions of a data item.

Option Description
--repo-dir Repository directory (default .)
$ swamp data versions hello-world result --json
{
  "dataName": "result",
  "modelName": "hello-world",
  "modelType": "command/shell",
  "versions": [
    {
      "version": 2,
      "createdAt": "2026-04-07T18:03:08.737Z",
      "size": 135,
      "checksum": "d58d1607...",
      "isLatest": true
    },
    {
      "version": 1,
      "createdAt": "2026-04-07T18:02:58.146Z",
      "size": 157,
      "checksum": "c631b676...",
      "isLatest": false
    }
  ],
  "total": 2
}

swamp data search [query]

Search across all data in the repository. Opens an interactive picker in a terminal, or returns JSON with --json.

Option Description
--type Filter by data type tag (resource, file, report)
--lifetime Filter by lifetime (ephemeral, infinite, job, workflow, or duration)
--owner-type Filter by owner type (model-method, workflow-step, manual)
--workflow Filter to data tagged with this workflow name
--model Filter to data owned by this model name
--content-type Filter by MIME content type
--since Only data created within duration (1h, 1d, 7d, 1w, 1mo)
--output Filter by output ID
--run Filter by workflow run ID
--tag Filter by tag (KEY=VALUE, repeatable)
--streaming Only show streaming data
--limit Maximum results (default 50)
--repo-dir Repository directory (default .)
$ swamp data search --type resource --json
{
  "query": "",
  "filters": {
    "type": "resource"
  },
  "results": [
    {
      "id": "5e7d72ab-7e0d-492e-ab3d-61463d9d4a85",
      "name": "execution-result",
      "version": 1,
      "contentType": "application/json",
      "type": "resource",
      "lifetime": "infinite",
      "ownerType": "model-method",
      "modelName": "hello-world",
      "modelType": "command/shell",
      "streaming": false,
      "size": 135,
      "createdAt": "2026-04-07T18:03:27.361Z",
      "tags": {
        "type": "resource",
        "specName": "result",
        "modelName": "hello-world"
      }
    }
  ],
  "total": 1,
  "limited": false
}

swamp data query [predicate]

Query data using a CEL predicate. The predicate evaluates against DataRecord fields directly (not prefixed with data.).

Option Description
--select CEL expression to project fields (e.g., name)
--limit Maximum results (default 100)
--repo-dir Repository directory (default .)

Available fields in the predicate: attributes, content, contentType, createdAt, dataType, id, lifetime, modelName, modelType, name, ownerType, size, specName, streaming, tags, version.

$ swamp data query 'tags.type == "resource"' --json
{
  "predicate": "tags.type == \"resource\"",
  "results": [
    {
      "id": "85c471af-a4c8-4f03-a5df-768351388d09",
      "name": "result",
      "version": 2,
      "tags": {
        "type": "resource",
        "specName": "result",
        "modelName": "hello-world"
      },
      "modelName": "hello-world",
      "modelType": "command/shell",
      "dataType": "resource",
      "contentType": "application/json",
      "lifetime": "infinite",
      "streaming": false,
      "size": 135
    }
  ],
  "total": 1,
  "limited": false
}

With --select to project a single field:

$ swamp data query 'tags.type == "resource"' --select 'name' --json
{
  "results": ["result", "execution-result"],
  "total": 2,
  "limited": false
}

swamp data rename <model> <old_name> <new_name>

Rename a data item. Creates a copy under the new name and writes a tombstone with a forward reference on the old name.

Option Description
--repo-dir Repository directory (default .)
$ swamp data rename hello-world result execution-result --json
{
  "oldName": "result",
  "newName": "execution-result",
  "modelId": "7347cf2c-cc9e-4203-8897-e10845af9732",
  "modelName": "hello-world",
  "modelType": "command/shell",
  "copiedVersion": 2,
  "newVersion": 1,
  "warning": "Any workflows or models that produce data under \"result\" will overwrite the forward reference. Update them to use \"execution-result\" instead."
}

Lookups for the old name without an explicit version follow the forward reference to the new name (up to 5 levels deep).

swamp data gc

Run garbage collection — delete expired data and prune old versions.

Option Description
--dry-run Show what would be deleted
-f, --force Skip confirmation prompt
--repo-dir Repository directory (default .)

Two phases execute in sequence:

  1. Expired data deletion — removes all versions of data whose lifetime has elapsed.
  2. Version pruning — removes old versions exceeding the garbage collection policy for non-expired data.
$ swamp data gc --dry-run --json
{
  "dataEntriesExpired": 0,
  "versionsDeleted": 0,
  "bytesReclaimed": 0,
  "dryRun": true,
  "expiredEntries": []
}

Validation Rules

  • Data names must be non-empty strings.
  • Data names must not contain .., /, \, or null bytes (path traversal protection).
  • The name latest is reserved (case-insensitive) and cannot be used as a data name.
  • Resource data is validated against the spec's Zod schema on write. Schema mismatches produce a warning, not an error.
  • New writes to an existing data name must have the same owner (ownerType + ownerRef) as the original.
  • Tags must include a type key.