Skip to main content

DATASTORE CONFIGURATION

The datastore controls where swamp stores runtime data — evaluated definitions, workflow runs, data outputs, secrets, audit logs, and telemetry. By default, everything lives in .swamp/ within the repository. An external filesystem path or an extension-provided backend (such as S3) can replace the default location.

datastore:
  type: filesystem
  path: /data/swamp-store

Backend Types

filesystem

Stores data at a directory on the local filesystem.

Property Value
Type identifier filesystem
Built-in Yes
Sync support No
Lock implementation File-based (atomic create)

When no datastore field is present in .swamp.yaml, swamp uses the filesystem backend with path set to {repoDir}/.swamp/.

Extension backends

Extension-provided datastores use a scoped type identifier in @collective/name format (e.g., @swamp/s3-datastore). Extensions implement the DatastoreProvider interface and are loaded from extensions/datastores/ within the repository.

Property Value
Type identifier @collective/name
Built-in No
Sync support Optional (extension-defined)
Lock implementation Extension-defined

The legacy type s3 is automatically remapped to @swamp/s3-datastore.

@swamp/s3-datastore

First-party extension that stores data in an Amazon S3 bucket with local cache synchronization. Distributed locking uses S3 conditional writes (If-None-Match: *). Bidirectional sync transfers files between a local cache directory and S3.

datastore:
  type: "@swamp/s3-datastore"
  config:
    bucket: my-swamp-bucket
    prefix: project-name
    region: us-east-1

Config fields

Field Type Required Default Description
bucket string Yes S3 bucket name
prefix string No None Key prefix within the bucket
region string No None AWS region (e.g., us-east-1)
endpoint string No None Custom S3-compatible endpoint URL (MinIO, DigitalOcean Spaces)
forcePathStyle boolean No false Force path-style S3 URLs instead of virtual-hosted-style

Authentication

Uses the default AWS credential chain — no credentials in the config object. Provide credentials via one of:

  • Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
  • AWS profile: ~/.aws/credentials
  • IAM role attached to the instance or task

Required IAM permissions

  • s3:HeadBucket
  • s3:GetObject
  • s3:PutObject
  • s3:DeleteObject
  • s3:ListBucket
  • s3:HeadObject

Setup

$ swamp datastore setup extension @swamp/s3-datastore \
    --config '{"bucket":"my-bucket","region":"us-east-1"}'

With a key prefix and custom endpoint:

$ swamp datastore setup extension @swamp/s3-datastore \
    --config '{"bucket":"my-bucket","prefix":"swamp","endpoint":"https://minio.internal:9000","forcePathStyle":true}'

Environment variable

SWAMP_DATASTORE=@swamp/s3-datastore:{"bucket":"my-bucket","region":"us-east-1"}

The legacy format SWAMP_DATASTORE=s3:my-bucket/prefix is also accepted and auto-remapped. The part before the first / is the bucket; the rest is the prefix.

Sync

The S3 backend supports bidirectional sync via swamp datastore sync. Write commands automatically pull before executing and push after. Change detection compares file size and modification time against a remote index. Transfers run with a concurrency of 10.

The local cache lives at ~/.swamp/repos/{repoId}/ by default.

S3-compatible services

The endpoint and forcePathStyle fields enable use with S3-compatible services such as MinIO, DigitalOcean Spaces, Backblaze B2, and Cloudflare R2. Set forcePathStyle: true when the service does not support virtual-hosted-style bucket addressing.

@swamp/gcs-datastore

First-party extension that stores data in a Google Cloud Storage bucket. Distributed locking uses GCS generation-based preconditions. Bidirectional sync works the same way as the S3 extension.

datastore:
  type: "@swamp/gcs-datastore"
  config:
    bucket: my-gcs-bucket
    prefix: swamp
Field Type Required Description
bucket string Yes GCS bucket name
prefix string No Key prefix within the bucket
projectId string No GCP project ID (defaults to the project from Application Default Credentials)
apiEndpoint string No Custom API endpoint URL (for emulators like fake-gcs-server; skips auth)

Authentication uses Google Cloud Application Default Credentials (ADC):

  • Environment variable: GOOGLE_APPLICATION_CREDENTIALS pointing to a service account key JSON file
  • User credentials: gcloud auth application-default login
  • Attached service account on GCE, Cloud Run, or GKE

Required IAM permissions (covered by the roles/storage.objectAdmin predefined role):

  • storage.buckets.get
  • storage.objects.create
  • storage.objects.get
  • storage.objects.delete
  • storage.objects.list

Configuration Fields

The datastore field in .swamp.yaml accepts the following properties.

type

Backend type identifier.

Property Value
Type string
Required Yes (within datastore)
Values filesystem, or an extension type (@collective/name)

path

Absolute path for the filesystem datastore directory.

Property Value
Type string (absolute path)
Required No
Default {repoDir}/.swamp/
Applies filesystem type only

config

Arbitrary key-value configuration passed to extension-provided datastores. The extension's configSchema (if defined) validates this object at setup time.

Property Value
Type Record<string, unknown>
Required No
Applies Extension types only
datastore:
  type: "@swamp/s3-datastore"
  config:
    bucket: my-swamp-bucket
    prefix: project-name
    region: us-east-1

directories

Which .swamp/ subdirectories are stored in the datastore. When omitted, all default subdirectories are included.

Property Value
Type string[]
Required No
Default All default subdirectories (see below)

Default subdirectories:

definitions-evaluated    workflows-evaluated    data
outputs                  workflow-runs          secrets
bundles                  vault-bundles          driver-bundles
report-bundles           audit                  telemetry
logs                     files
datastore:
  type: filesystem
  path: /data/swamp-store
  directories:
    - data
    - outputs
    - workflow-runs

Subdirectories not in this list remain in the local .swamp/ directory.

exclude

Gitignore-style patterns for files to exclude from datastore operations.

Property Value
Type string[]
Required No
datastore:
  type: filesystem
  path: /data/swamp-store
  exclude:
    - "telemetry/**"

Legacy S3 shorthand fields

These fields are accepted in .swamp.yaml for backwards compatibility with the legacy s3 type. They are equivalent to placing the values inside config for the @swamp/s3-datastore extension.

Field Type Description
bucket string S3 bucket name
prefix string Key prefix within the bucket
region string AWS region
endpoint string (URL) Custom S3-compatible endpoint URL
forcePathStyle boolean Force path-style S3 URLs (default false)

Resolution Priority

When multiple configuration sources exist, they are resolved in this order (highest priority first):

Priority Source
1 SWAMP_DATASTORE environment variable
2 CLI --datastore argument
3 .swamp.yaml datastore field
4 Default: filesystem at {repoDir}/.swamp/

SWAMP_DATASTORE Environment Variable

Overrides the datastore configuration for a single invocation. Format:

SWAMP_DATASTORE=<type>:<value>
Format Example
Filesystem SWAMP_DATASTORE=filesystem:/path/to/dir
Extension (JSON) SWAMP_DATASTORE=@swamp/s3-datastore:{"bucket":"my-bucket","region":"us-east-1"}
Legacy S3 SWAMP_DATASTORE=s3:my-bucket/prefix

The legacy s3:bucket/prefix format is auto-remapped to @swamp/s3-datastore with the bucket and optional prefix extracted.

Environment variables within the value are expanded (e.g., filesystem:$HOME/swamp-data).

$ SWAMP_DATASTORE="filesystem:/tmp/override" swamp datastore status --json
{
  "type": "filesystem",
  "path": "/tmp/override",
  "healthy": true,
  "message": "Filesystem datastore at /tmp/override is healthy",
  "latencyMs": 0.36,
  "directories": [
    "definitions-evaluated",
    "workflows-evaluated",
    "data",
    "outputs",
    "workflow-runs",
    "secrets",
    "bundles",
    "vault-bundles",
    "driver-bundles",
    "report-bundles",
    "audit",
    "telemetry",
    "logs",
    "files"
  ]
}

Sync Behavior

Sync applies only to extension-provided datastores that implement a sync service. Filesystem datastores do not sync.

Write commands follow this lifecycle:

  1. Pull — download changed files from the remote backend to the local cache.
  2. Execute — run the command against the local cache.
  3. Push — upload changed files from the local cache to the remote backend.

Read-only commands skip sync entirely.

Change detection compares file size and modification time against a remote index. Transfer concurrency is capped at 10 concurrent file operations.

Cache path

Extension backends use a local cache directory for reads and writes. The sync service transfers data between this cache and the remote backend. Filesystem datastores access the configured path directly and have no separate cache.

Manual sync

$ swamp datastore sync

Triggers a full pull-then-push cycle. Only available for sync-capable extension datastores. Filesystem datastores return an error:

Datastore sync is only available for sync-capable custom datastores.
Current datastore type: filesystem

Distributed Locking

Write commands acquire a distributed lock before syncing and executing. The lock prevents concurrent writers from corrupting the datastore.

Lock metadata

Field Type Description
holder string user@hostname
hostname string Machine name
pid number Process ID of the lock holder
acquiredAt string (ISO 8601) When the lock was acquired or last renewed
ttlMs number Lock duration in milliseconds before considered stale
nonce string (optional) UUID fencing token for this acquisition

Lock parameters

Parameter Default Description
ttlMs 30,000 (30s) Lock lifetime before considered stale
retryIntervalMs 1,000 (1s) Retry interval when lock is held
maxWaitMs 60,000 (60s) Maximum wait before giving up

Heartbeat

The lock holder renews the lock every ttlMs / 3 milliseconds (10 seconds at the default TTL). Each renewal writes a fresh acquiredAt timestamp.

Stale lock detection

A lock is considered stale when either:

  • The holder process is dead (checked via OS signal).
  • The TTL has expired (acquiredAt + ttlMs < now).

Stale locks are automatically reclaimed by the next writer.

Nonce fencing

Each lock acquisition generates a unique UUID nonce. Heartbeat renewals verify the on-disk nonce matches the held nonce. If a mismatch is detected (another process reclaimed the lock), the holder self-revokes.

SIGINT handling

When a process receives SIGINT (Ctrl-C) during a locked operation, a best-effort lock release runs before exit.

Per-model locks

In addition to the global datastore lock, individual model operations acquire per-model locks scoped to the model type and ID. Both lock types use the same mechanism.


CLI Commands

All datastore commands accept the standard global options (--json, --log, --log-level, -q, -v, --no-telemetry, --no-color, --show-properties).

swamp datastore status

Show datastore configuration and health.

Option Description
--repo-dir Repository directory (default .)
$ swamp datastore status
Datastore Status
  Type:    filesystem
  Path:    /home/user/my-repo/.swamp
  Health:  ● healthy (0ms)
  Dirs:    definitions-evaluated, workflows-evaluated, data, outputs, ...
$ swamp datastore status --json
{
  "type": "filesystem",
  "path": "/home/user/my-repo/.swamp",
  "healthy": true,
  "message": "Filesystem datastore at /home/user/my-repo/.swamp is healthy",
  "latencyMs": 0.40,
  "directories": [
    "definitions-evaluated",
    "workflows-evaluated",
    "data",
    "outputs",
    "workflow-runs",
    "secrets",
    "bundles",
    "vault-bundles",
    "driver-bundles",
    "report-bundles",
    "audit",
    "telemetry",
    "logs",
    "files"
  ]
}

swamp datastore setup filesystem

Configure a filesystem datastore backend.

Option Description
--path Absolute path for the datastore directory (required)
--directories Subdirectories to store in the datastore (comma-separated)
--skip-migration Skip migrating existing data from .swamp/
--repo-dir Repository directory (default .)
$ swamp datastore setup filesystem --path /data/swamp-store --json
{
  "type": "filesystem",
  "path": "/data/swamp-store",
  "filesCopied": 3,
  "bytesCopied": 94576,
  "directoriesMigrated": [
    "definitions-evaluated",
    "workflows-evaluated",
    "data",
    "outputs",
    "workflow-runs",
    "secrets",
    "vault-bundles",
    "audit",
    "telemetry"
  ],
  "errors": []
}

When --directories is specified, only those subdirectories are moved to the external path. The rest remain in .swamp/.

$ swamp datastore setup filesystem \
    --path /data/swamp-store \
    --directories data,outputs,workflow-runs \
    --json
{
  "type": "filesystem",
  "path": "/data/swamp-store",
  "filesCopied": 3,
  "bytesCopied": 94576,
  "directoriesMigrated": [
    "data",
    "outputs",
    "workflow-runs"
  ],
  "errors": []
}

The resulting .swamp.yaml:

datastore:
  type: filesystem
  path: /data/swamp-store
  directories:
    - data
    - outputs
    - workflow-runs

swamp datastore setup extension

Configure an extension-provided datastore backend.

Option Description
<type> Extension type identifier (e.g., @swamp/s3-datastore)
--config JSON config object for the extension (required)
--skip-migration Skip migrating existing data from .swamp/
--repo-dir Repository directory (default .)
$ swamp datastore setup extension @swamp/s3-datastore \
    --config '{"bucket":"my-bucket","region":"us-east-1"}'

swamp datastore sync

Manually sync the local cache with a remote datastore.

Option Description
--pull Pull only — fetch remote data to local cache
--push Push only — upload local cache to remote
--repo-dir Repository directory (default .)

Without --pull or --push, runs a full sync (pull then push).

Only available for sync-capable extension datastores. Filesystem datastores return an error.

$ swamp datastore sync --pull
$ swamp datastore sync --push
$ swamp datastore sync

swamp datastore lock status

Show who holds the datastore lock.

Option Description
--repo-dir Repository directory (default .)
$ swamp datastore lock status
Lock Status: no lock held

When a lock is held, the output includes the holder, hostname, PID, acquisition time, TTL, and nonce.

$ swamp datastore lock status --json

Returns the lock metadata object, or null if no lock is held.

swamp datastore lock release

Force-release a stuck datastore lock. This is a breakglass operation for recovering from a process that died without releasing its lock and the automatic stale-lock detection has not yet reclaimed it.

Option Description
--force Required to confirm the force release
--model Release a specific model's lock (type/id format, e.g., aws-ec2/my-server)
--repo-dir Repository directory (default .)

Without --model, releases the global datastore lock. With --model, releases the per-model lock for the specified model.

$ swamp datastore lock release --force --json
{
  "released": false,
  "reason": "no lock held"
}
$ swamp datastore lock release --force --model aws-ec2/my-server

Setup Pipeline

When swamp datastore setup runs, it follows this sequence:

  1. Validate — verify the target is accessible (writable directory or reachable remote).
  2. Migrate — copy existing data from .swamp/ to the new location (unless --skip-migration).
  3. Verify — compare file counts between source and destination.
  4. Update — write the datastore field to .swamp.yaml.
  5. Clean up — remove migrated subdirectories from .swamp/.

Path Resolution

Each file operation resolves to either the local .swamp/ directory or the configured datastore path:

  • If the file's parent subdirectory is in the directories list and does not match an exclude pattern, it goes to the datastore path.
  • Otherwise, it stays in local .swamp/.

For extension backends with sync support, the "datastore path" is the local cache directory. The sync service handles transfer between the cache and the remote backend.