Data Querying

swamp data query searches across all data artifacts in a repository using CEL predicates. Each predicate is evaluated against every data record, and matching records are returned. The same query engine powers the data.query() function in CEL expressions.

Command

swamp data query [predicate]

Option	Type	Default	Description
`--select`	string	None	CEL expression to project fields from matching records
`--limit`	number	`100`	Maximum number of results returned
`--repo-dir`	string	`.`	Repository directory
`--json`	flag	—	Output in JSON format

When no predicate is provided and stdout is a TTY, the command opens an interactive TUI for browsing and filtering data. When no predicate is provided in non-interactive mode (piped output or --json), the command returns an error.

DataRecord Fields

Predicates evaluate against DataRecord fields as top-level variables. No namespace prefix is needed — use modelName, not data.modelName.

Field	Type	Description
`id`	`string`	Record UUID
`name`	`string`	Data item name
`version`	`int`	Version number
`createdAt`	`string`	ISO 8601 timestamp
`attributes`	`map<string, dyn>`	Parsed JSON content (resource data)
`tags`	`map<string, string>`	Metadata tags
`modelName`	`string`	Owning model definition name
`modelType`	`string`	Model type path (e.g., `command/shell`)
`specName`	`string`	Output spec name
`dataType`	`string`	`resource`, `file`, or `report`
`contentType`	`string`	MIME type (e.g., `application/json`)
`lifetime`	`string`	Retention policy (e.g., `infinite`, `30d`)
`ownerType`	`string`	`model-method`, `workflow-step`, or `manual`
`streaming`	`bool`	Whether the data uses streaming writes
`size`	`int`	Content size in bytes
`content`	`string`	Raw text content

Referencing an unknown field produces an error listing the available fields:

$ swamp data query 'badField == "test"' --json

{
  "error": "Unknown field \"badField\" in query predicate.\nAvailable: attributes, content, contentType, createdAt, dataType, id, lifetime, modelName, modelType, name, ownerType, size, specName, streaming, tags, version"
}

Lazy-Loaded Fields

The attributes and content fields are loaded from disk only when referenced in the predicate or the --select expression. All other fields are read from the metadata catalog without touching data files.

attributes: Populated for application/json data. The raw content is parsed as JSON. Invalid JSON is treated as an empty map.
content: Populated for text content types (text/plain, text/markdown, application/json, application/yaml, etc.). Binary content types produce an empty string.

When neither field is referenced, queries run entirely against the catalog index.

Predicates

A predicate is a CEL expression that evaluates to a boolean. Records where the predicate returns true are included in the results.

Comparison

$ swamp data query 'modelName == "scanner"' --json

{
  "predicate": "modelName == \"scanner\"",
  "results": [
    {
      "id": "7a3708d4-4767-4e45-912e-6e4ab42f5ea5",
      "name": "log",
      "version": 1,
      "tags": {
        "type": "file",
        "specName": "log",
        "modelName": "scanner",
        "env": "prod"
      },
      "modelName": "scanner",
      "modelType": "command/shell",
      "specName": "log",
      "dataType": "file",
      "contentType": "text/plain",
      "lifetime": "infinite",
      "streaming": true,
      "size": 22
    },
    {
      "id": "696937a4-6517-4002-93f3-029435fec355",
      "name": "result",
      "version": 1,
      "tags": {
        "type": "resource",
        "specName": "result",
        "modelName": "scanner",
        "env": "prod"
      },
      "modelName": "scanner",
      "modelType": "command/shell",
      "specName": "result",
      "dataType": "resource",
      "contentType": "application/json",
      "lifetime": "infinite",
      "streaming": false,
      "size": 141
    }
  ],
  "total": 2,
  "limited": false
}

Note

Examples on this page show a subset of DataRecord fields for brevity. Actual JSON output includes all fields listed in the table above (e.g., createdAt, ownerType, attributes, content).

Numeric Comparison

$ swamp data query 'size > 100' --limit 2 --json

{
  "predicate": "size > 100",
  "results": [
    {
      "id": "1ea7486a-4619-41dd-9fb6-cde6a70819df",
      "name": "report-swamp-method-summary",
      "version": 1,
      "dataType": "report",
      "contentType": "text/markdown",
      "size": 493
    },
    {
      "id": "75046034-50b6-47e2-bac8-613e119e17b8",
      "name": "report-swamp-method-summary-json",
      "version": 1,
      "dataType": "report",
      "contentType": "application/json",
      "size": 2653
    }
  ],
  "total": 2,
  "limited": true
}

When results are truncated by --limit, the response includes "limited": true.

Boolean Fields

$ swamp data query 'streaming == true' --json

Returns all data records with streaming enabled.

Compound Predicates

Combine conditions with && (and) and || (or):

$ swamp data query 'modelName == "scanner" && specName == "result"' --json

{
  "predicate": "modelName == \"scanner\" && specName == \"result\"",
  "results": [
    {
      "id": "696937a4-6517-4002-93f3-029435fec355",
      "name": "result",
      "version": 1,
      "modelName": "scanner",
      "specName": "result",
      "dataType": "resource",
      "contentType": "application/json",
      "size": 141
    }
  ],
  "total": 1,
  "limited": false
}

String Methods

CEL string methods work on string fields:

swamp data query 'name.contains("result")'
swamp data query 'modelName.startsWith("scan")'
swamp data query 'contentType.matches("application/.*")'

See String Methods in the CEL reference for the complete list.

Attribute Filtering

Access nested fields within attributes to filter on resource content. This triggers lazy loading of the content from disk.

$ swamp data query 'dataType == "resource" && attributes.exitCode == 0' --json

{
  "predicate": "dataType == \"resource\" && attributes.exitCode == 0",
  "results": [
    {
      "id": "f6dffa8a-1fe9-4f7b-bb1d-b7a64f6edf22",
      "name": "result",
      "version": 1,
      "attributes": {
        "exitCode": 0,
        "executedAt": "2026-04-07T23:45:02.601Z",
        "command": "echo \"Hello from the swamp!\"",
        "durationMs": 3,
        "stdout": "Hello from the swamp!",
        "stderr": ""
      },
      "modelName": "hello-world",
      "specName": "result",
      "dataType": "resource",
      "size": 157
    },
    {
      "id": "696937a4-6517-4002-93f3-029435fec355",
      "name": "result",
      "version": 1,
      "attributes": {
        "exitCode": 0,
        "executedAt": "2026-04-07T23:45:08.739Z",
        "command": "echo \"scan complete\"",
        "durationMs": 3,
        "stdout": "scan complete",
        "stderr": ""
      },
      "modelName": "scanner",
      "specName": "result",
      "dataType": "resource",
      "size": 141
    }
  ],
  "total": 2,
  "limited": false
}

When a record's attributes map does not contain the referenced key, the record is excluded from results rather than producing an error.

Tag Filtering

Tags are accessible as a nested map via the tags field. Use dot notation or bracket notation to access tag values:

swamp data query 'tags.env == "prod"'
swamp data query 'tags["env"] == "prod"'
swamp data query 'tags.type == "resource"'

$ swamp data query 'tags.env == "prod"' --json

{
  "predicate": "tags.env == \"prod\"",
  "results": [
    {
      "id": "7a3708d4-4767-4e45-912e-6e4ab42f5ea5",
      "name": "log",
      "version": 1,
      "tags": {
        "type": "file",
        "specName": "log",
        "modelName": "scanner",
        "env": "prod"
      },
      "modelName": "scanner",
      "dataType": "file",
      "size": 22
    },
    {
      "id": "696937a4-6517-4002-93f3-029435fec355",
      "name": "result",
      "version": 1,
      "tags": {
        "type": "resource",
        "specName": "result",
        "modelName": "scanner",
        "env": "prod"
      },
      "modelName": "scanner",
      "dataType": "resource",
      "size": 141
    }
  ],
  "total": 2,
  "limited": false
}

Records that do not have the referenced tag key are silently excluded from results (no error).

Tag Sources

Tags on data records come from multiple sources, resolved in order. See the Tag Resolution Chain in the data outputs reference for the full precedence.

Three tags are always present on every data record:

Tag Key	Description
`type`	`resource`, `file`, or `report`
`specName`	Output spec name from the model type
`modelName`	Model definition name

Custom tags are added via --tag flags on method runs, workflow dataOutputOverrides, or the output spec's tags field in the model definition.

Projections

The --select flag transforms each matching record into a specified shape. The select expression is a CEL expression evaluated against each matching record's fields.

Scalar Projection

Extract a single field value. Returns an array of values.

$ swamp data query 'dataType == "resource"' --select name --json

{
  "results": [
    "result",
    "result"
  ],
  "total": 2,
  "limited": false
}

Map Projection

Build an object from selected fields. Returns an array of objects.

$ swamp data query 'dataType == "resource"' --select '{"name": name, "model": modelName, "size": size}' --json

{
  "results": [
    {
      "name": "result",
      "model": "hello-world",
      "size": 157
    },
    {
      "name": "result",
      "model": "scanner",
      "size": 141
    }
  ],
  "total": 2,
  "limited": false
}

List Projection

Build an array from selected fields. Returns an array of arrays.

$ swamp data query 'dataType == "resource"' --select '[name, modelName, size]' --json

{
  "results": [
    [
      "result",
      "hello-world",
      157
    ],
    [
      "result",
      "scanner",
      141
    ]
  ],
  "total": 2,
  "limited": false
}

Accessing Nested Data in Projections

Select expressions can reference attributes and content even if the predicate does not. The query engine detects field references in both the predicate and select expression to determine which fields to load from disk.

swamp data query 'dataType == "resource"' --select 'attributes.stdout'

If a record's attributes do not contain the referenced key, the projection produces null for that record.

`data.query()` in CEL Expressions

The data.query() function provides the same query capability inside CEL expressions used in model definitions, workflow steps, and data output overrides.

data.query("modelName == \"scanner\" && size > 1000")
data.query("modelName == \"scanner\"", "attributes.status")

Signature	Returns	Description
`data.query(predicate)`	`list<DataRecord>`	Records matching the predicate
`data.query(predicate, select)`	`list<dyn>`	Projected values from matches

The predicate and select arguments are strings containing CEL expressions. The same DataRecord fields and operators are available as in the CLI command.

Interactive Mode

When invoked without a predicate in a terminal, swamp data query opens an interactive TUI for browsing data. The TUI supports:

Filtering by tag keys and values
Text search across record fields
Selecting and inspecting individual records

Non-interactive invocations (piped output, --json flag, or no TTY) require a CEL predicate argument.

Result Structure

JSON output includes these top-level fields:

Field	Type	Description
`predicate`	`string`	The CEL predicate used (omitted with `--select`)
`results`	`list`	Matching DataRecords or projected values
`total`	`int`	Number of results returned
`limited`	`bool`	`true` when results were truncated by `--limit`

Without --select, each result is a full DataRecord. With --select, each result is the projected value (scalar, map, or list depending on the select expression shape).