Skip to main content

DATA QUERYING

swamp data query searches across all data artifacts in a repository using CEL predicates. Each predicate is evaluated against every data record, and matching records are returned. The same query engine powers the data.query() function in CEL expressions.

Command

swamp data query [predicate]
Option Type Default Description
--select string None CEL expression to project fields from matching records
--limit number 100 Maximum number of results returned
--repo-dir string . Repository directory
--json flag Output in JSON format

When no predicate is provided and stdout is a TTY, the command opens an interactive TUI for browsing and filtering data. When no predicate is provided in non-interactive mode (piped output or --json), the command returns an error.


DataRecord Fields

Predicates evaluate against DataRecord fields as top-level variables. No namespace prefix is needed — use modelName, not data.modelName.

Field Type Description
id string Record UUID
name string Data item name
version int Version number
createdAt string ISO 8601 timestamp
attributes map<string, dyn> Parsed JSON content (resource data)
tags map<string, string> Metadata tags
modelName string Owning model definition name
modelType string Model type path (e.g., command/shell)
specName string Output spec name
dataType string resource, file, or report
contentType string MIME type (e.g., application/json)
lifetime string Retention policy (e.g., infinite, 30d)
ownerType string model-method, workflow-step, or manual
streaming bool Whether the data uses streaming writes
size int Content size in bytes
content string Raw text content

Referencing an unknown field produces an error listing the available fields:

$ swamp data query 'badField == "test"' --json
{
  "error": "Unknown field \"badField\" in query predicate.\nAvailable: attributes, content, contentType, createdAt, dataType, id, lifetime, modelName, modelType, name, ownerType, size, specName, streaming, tags, version"
}

Lazy-Loaded Fields

The attributes and content fields are loaded from disk only when referenced in the predicate or the --select expression. All other fields are read from the metadata catalog without touching data files.

  • attributes: Populated for application/json data. The raw content is parsed as JSON. Invalid JSON is treated as an empty map.
  • content: Populated for text content types (text/plain, text/markdown, application/json, application/yaml, etc.). Binary content types produce an empty string.

When neither field is referenced, queries run entirely against the catalog index.


Predicates

A predicate is a CEL expression that evaluates to a boolean. Records where the predicate returns true are included in the results.

Comparison

$ swamp data query 'modelName == "scanner"' --json
{
  "predicate": "modelName == \"scanner\"",
  "results": [
    {
      "id": "7a3708d4-4767-4e45-912e-6e4ab42f5ea5",
      "name": "log",
      "version": 1,
      "tags": {
        "type": "file",
        "specName": "log",
        "modelName": "scanner",
        "env": "prod"
      },
      "modelName": "scanner",
      "modelType": "command/shell",
      "specName": "log",
      "dataType": "file",
      "contentType": "text/plain",
      "lifetime": "infinite",
      "streaming": true,
      "size": 22
    },
    {
      "id": "696937a4-6517-4002-93f3-029435fec355",
      "name": "result",
      "version": 1,
      "tags": {
        "type": "resource",
        "specName": "result",
        "modelName": "scanner",
        "env": "prod"
      },
      "modelName": "scanner",
      "modelType": "command/shell",
      "specName": "result",
      "dataType": "resource",
      "contentType": "application/json",
      "lifetime": "infinite",
      "streaming": false,
      "size": 141
    }
  ],
  "total": 2,
  "limited": false
}

Note

Examples on this page show a subset of DataRecord fields for brevity. Actual JSON output includes all fields listed in the table above (e.g., createdAt, ownerType, attributes, content).

Numeric Comparison

$ swamp data query 'size > 100' --limit 2 --json
{
  "predicate": "size > 100",
  "results": [
    {
      "id": "1ea7486a-4619-41dd-9fb6-cde6a70819df",
      "name": "report-swamp-method-summary",
      "version": 1,
      "dataType": "report",
      "contentType": "text/markdown",
      "size": 493
    },
    {
      "id": "75046034-50b6-47e2-bac8-613e119e17b8",
      "name": "report-swamp-method-summary-json",
      "version": 1,
      "dataType": "report",
      "contentType": "application/json",
      "size": 2653
    }
  ],
  "total": 2,
  "limited": true
}

When results are truncated by --limit, the response includes "limited": true.

Boolean Fields

$ swamp data query 'streaming == true' --json

Returns all data records with streaming enabled.

Compound Predicates

Combine conditions with && (and) and || (or):

$ swamp data query 'modelName == "scanner" && specName == "result"' --json
{
  "predicate": "modelName == \"scanner\" && specName == \"result\"",
  "results": [
    {
      "id": "696937a4-6517-4002-93f3-029435fec355",
      "name": "result",
      "version": 1,
      "modelName": "scanner",
      "specName": "result",
      "dataType": "resource",
      "contentType": "application/json",
      "size": 141
    }
  ],
  "total": 1,
  "limited": false
}

String Methods

CEL string methods work on string fields:

swamp data query 'name.contains("result")'
swamp data query 'modelName.startsWith("scan")'
swamp data query 'contentType.matches("application/.*")'

See String Methods in the CEL reference for the complete list.

Attribute Filtering

Access nested fields within attributes to filter on resource content. This triggers lazy loading of the content from disk.

$ swamp data query 'dataType == "resource" && attributes.exitCode == 0' --json
{
  "predicate": "dataType == \"resource\" && attributes.exitCode == 0",
  "results": [
    {
      "id": "f6dffa8a-1fe9-4f7b-bb1d-b7a64f6edf22",
      "name": "result",
      "version": 1,
      "attributes": {
        "exitCode": 0,
        "executedAt": "2026-04-07T23:45:02.601Z",
        "command": "echo \"Hello from the swamp!\"",
        "durationMs": 3,
        "stdout": "Hello from the swamp!",
        "stderr": ""
      },
      "modelName": "hello-world",
      "specName": "result",
      "dataType": "resource",
      "size": 157
    },
    {
      "id": "696937a4-6517-4002-93f3-029435fec355",
      "name": "result",
      "version": 1,
      "attributes": {
        "exitCode": 0,
        "executedAt": "2026-04-07T23:45:08.739Z",
        "command": "echo \"scan complete\"",
        "durationMs": 3,
        "stdout": "scan complete",
        "stderr": ""
      },
      "modelName": "scanner",
      "specName": "result",
      "dataType": "resource",
      "size": 141
    }
  ],
  "total": 2,
  "limited": false
}

When a record's attributes map does not contain the referenced key, the record is excluded from results rather than producing an error.


Tag Filtering

Tags are accessible as a nested map via the tags field. Use dot notation or bracket notation to access tag values:

swamp data query 'tags.env == "prod"'
swamp data query 'tags["env"] == "prod"'
swamp data query 'tags.type == "resource"'
$ swamp data query 'tags.env == "prod"' --json
{
  "predicate": "tags.env == \"prod\"",
  "results": [
    {
      "id": "7a3708d4-4767-4e45-912e-6e4ab42f5ea5",
      "name": "log",
      "version": 1,
      "tags": {
        "type": "file",
        "specName": "log",
        "modelName": "scanner",
        "env": "prod"
      },
      "modelName": "scanner",
      "dataType": "file",
      "size": 22
    },
    {
      "id": "696937a4-6517-4002-93f3-029435fec355",
      "name": "result",
      "version": 1,
      "tags": {
        "type": "resource",
        "specName": "result",
        "modelName": "scanner",
        "env": "prod"
      },
      "modelName": "scanner",
      "dataType": "resource",
      "size": 141
    }
  ],
  "total": 2,
  "limited": false
}

Records that do not have the referenced tag key are silently excluded from results (no error).

Tag Sources

Tags on data records come from multiple sources, resolved in order. See the Tag Resolution Chain in the data outputs reference for the full precedence.

Three tags are always present on every data record:

Tag Key Description
type resource, file, or report
specName Output spec name from the model type
modelName Model definition name

Custom tags are added via --tag flags on method runs, workflow dataOutputOverrides, or the output spec's tags field in the model definition.


Projections

The --select flag transforms each matching record into a specified shape. The select expression is a CEL expression evaluated against each matching record's fields.

Scalar Projection

Extract a single field value. Returns an array of values.

$ swamp data query 'dataType == "resource"' --select name --json
{
  "results": [
    "result",
    "result"
  ],
  "total": 2,
  "limited": false
}

Map Projection

Build an object from selected fields. Returns an array of objects.

$ swamp data query 'dataType == "resource"' --select '{"name": name, "model": modelName, "size": size}' --json
{
  "results": [
    {
      "name": "result",
      "model": "hello-world",
      "size": 157
    },
    {
      "name": "result",
      "model": "scanner",
      "size": 141
    }
  ],
  "total": 2,
  "limited": false
}

List Projection

Build an array from selected fields. Returns an array of arrays.

$ swamp data query 'dataType == "resource"' --select '[name, modelName, size]' --json
{
  "results": [
    [
      "result",
      "hello-world",
      157
    ],
    [
      "result",
      "scanner",
      141
    ]
  ],
  "total": 2,
  "limited": false
}

Accessing Nested Data in Projections

Select expressions can reference attributes and content even if the predicate does not. The query engine detects field references in both the predicate and select expression to determine which fields to load from disk.

swamp data query 'dataType == "resource"' --select 'attributes.stdout'

If a record's attributes do not contain the referenced key, the projection produces null for that record.


data.query() in CEL Expressions

The data.query() function provides the same query capability inside CEL expressions used in model definitions, workflow steps, and data output overrides.

data.query("modelName == \"scanner\" && size > 1000")
data.query("modelName == \"scanner\"", "attributes.status")
Signature Returns Description
data.query(predicate) list<DataRecord> Records matching the predicate
data.query(predicate, select) list<dyn> Projected values from matches

The predicate and select arguments are strings containing CEL expressions. The same DataRecord fields and operators are available as in the CLI command.


Interactive Mode

When invoked without a predicate in a terminal, swamp data query opens an interactive TUI for browsing data. The TUI supports:

  • Filtering by tag keys and values
  • Text search across record fields
  • Selecting and inspecting individual records

Non-interactive invocations (piped output, --json flag, or no TTY) require a CEL predicate argument.


Result Structure

JSON output includes these top-level fields:

Field Type Description
predicate string The CEL predicate used (omitted with --select)
results list Matching DataRecords or projected values
total int Number of results returned
limited bool true when results were truncated by --limit

Without --select, each result is a full DataRecord. With --select, each result is the projected value (scalar, map, or list depending on the select expression shape).