DATASTORE CONFIGURATION
The datastore controls where swamp stores runtime data — evaluated definitions,
workflow runs, data outputs, secrets, audit logs, and telemetry. By default,
everything lives in .swamp/ within the repository. An external filesystem path
or an extension-provided backend (such as S3) can replace the default location.
datastore:
type: filesystem
path: /data/swamp-storeBackend Types
filesystem
Stores data at a directory on the local filesystem.
| Property | Value |
|---|---|
| Type identifier | filesystem |
| Built-in | Yes |
| Sync support | No |
| Lock implementation | File-based (atomic create) |
When no datastore field is present in .swamp.yaml, swamp uses the filesystem
backend with path set to {repoDir}/.swamp/.
Extension backends
Extension-provided datastores use a scoped type identifier in @collective/name
format (e.g., @swamp/s3-datastore). Extensions implement the
DatastoreProvider interface and are loaded from extensions/datastores/
within the repository.
| Property | Value |
|---|---|
| Type identifier | @collective/name |
| Built-in | No |
| Sync support | Optional (extension-defined) |
| Lock implementation | Extension-defined |
The legacy type s3 is automatically remapped to @swamp/s3-datastore.
@swamp/s3-datastore
First-party extension that stores data in an Amazon S3 bucket with local cache
synchronization. Distributed locking uses S3 conditional writes
(If-None-Match: *). Bidirectional sync transfers files between a local cache
directory and S3.
datastore:
type: "@swamp/s3-datastore"
config:
bucket: my-swamp-bucket
prefix: project-name
region: us-east-1Config fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
bucket |
string | Yes | — | S3 bucket name |
prefix |
string | No | None | Key prefix within the bucket |
region |
string | No | None | AWS region (e.g., us-east-1) |
endpoint |
string | No | None | Custom S3-compatible endpoint URL (MinIO, DigitalOcean Spaces) |
forcePathStyle |
boolean | No | false |
Force path-style S3 URLs instead of virtual-hosted-style |
Authentication
Uses the default AWS credential chain — no credentials in the config object. Provide credentials via one of:
- Environment variables:
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY - AWS profile:
~/.aws/credentials - IAM role attached to the instance or task
Required IAM permissions
s3:HeadBuckets3:GetObjects3:PutObjects3:DeleteObjects3:ListBuckets3:HeadObject
Setup
$ swamp datastore setup extension @swamp/s3-datastore \
--config '{"bucket":"my-bucket","region":"us-east-1"}'With a key prefix and custom endpoint:
$ swamp datastore setup extension @swamp/s3-datastore \
--config '{"bucket":"my-bucket","prefix":"swamp","endpoint":"https://minio.internal:9000","forcePathStyle":true}'Environment variable
SWAMP_DATASTORE=@swamp/s3-datastore:{"bucket":"my-bucket","region":"us-east-1"}The legacy format SWAMP_DATASTORE=s3:my-bucket/prefix is also accepted and
auto-remapped. The part before the first / is the bucket; the rest is the
prefix.
Sync
The S3 backend supports bidirectional sync via swamp datastore sync. Write
commands automatically pull before executing and push after. Change detection
compares file size and modification time against a remote index. Transfers run
with a concurrency of 10.
The local cache lives at ~/.swamp/repos/{repoId}/ by default.
S3-compatible services
The endpoint and forcePathStyle fields enable use with S3-compatible
services such as MinIO, DigitalOcean Spaces, Backblaze B2, and Cloudflare R2.
Set forcePathStyle: true when the service does not support
virtual-hosted-style bucket addressing.
@swamp/gcs-datastore
First-party extension that stores data in a Google Cloud Storage bucket. Distributed locking uses GCS generation-based preconditions. Bidirectional sync works the same way as the S3 extension.
datastore:
type: "@swamp/gcs-datastore"
config:
bucket: my-gcs-bucket
prefix: swamp| Field | Type | Required | Description |
|---|---|---|---|
bucket |
string | Yes | GCS bucket name |
prefix |
string | No | Key prefix within the bucket |
projectId |
string | No | GCP project ID (defaults to the project from Application Default Credentials) |
apiEndpoint |
string | No | Custom API endpoint URL (for emulators like fake-gcs-server; skips auth) |
Authentication uses Google Cloud Application Default Credentials (ADC):
- Environment variable:
GOOGLE_APPLICATION_CREDENTIALSpointing to a service account key JSON file - User credentials:
gcloud auth application-default login - Attached service account on GCE, Cloud Run, or GKE
Required IAM permissions (covered by the roles/storage.objectAdmin predefined
role):
storage.buckets.getstorage.objects.createstorage.objects.getstorage.objects.deletestorage.objects.list
Configuration Fields
The datastore field in .swamp.yaml accepts the following properties.
type
Backend type identifier.
| Property | Value |
|---|---|
| Type | string |
| Required | Yes (within datastore) |
| Values | filesystem, or an extension type (@collective/name) |
path
Absolute path for the filesystem datastore directory.
| Property | Value |
|---|---|
| Type | string (absolute path) |
| Required | No |
| Default | {repoDir}/.swamp/ |
| Applies | filesystem type only |
config
Arbitrary key-value configuration passed to extension-provided datastores. The
extension's configSchema (if defined) validates this object at setup time.
| Property | Value |
|---|---|
| Type | Record<string, unknown> |
| Required | No |
| Applies | Extension types only |
datastore:
type: "@swamp/s3-datastore"
config:
bucket: my-swamp-bucket
prefix: project-name
region: us-east-1directories
Which .swamp/ subdirectories are stored in the datastore. When omitted, all
default subdirectories are included.
| Property | Value |
|---|---|
| Type | string[] |
| Required | No |
| Default | All default subdirectories (see below) |
Default subdirectories:
definitions-evaluated workflows-evaluated data
outputs workflow-runs secrets
bundles vault-bundles driver-bundles
report-bundles audit telemetry
logs filesdatastore:
type: filesystem
path: /data/swamp-store
directories:
- data
- outputs
- workflow-runsSubdirectories not in this list remain in the local .swamp/ directory.
exclude
Gitignore-style patterns for files to exclude from datastore operations.
| Property | Value |
|---|---|
| Type | string[] |
| Required | No |
datastore:
type: filesystem
path: /data/swamp-store
exclude:
- "telemetry/**"Legacy S3 shorthand fields
These fields are accepted in .swamp.yaml for backwards compatibility with the
legacy s3 type. They are equivalent to placing the values inside config for
the @swamp/s3-datastore extension.
| Field | Type | Description |
|---|---|---|
bucket |
string | S3 bucket name |
prefix |
string | Key prefix within the bucket |
region |
string | AWS region |
endpoint |
string (URL) | Custom S3-compatible endpoint URL |
forcePathStyle |
boolean | Force path-style S3 URLs (default false) |
Resolution Priority
When multiple configuration sources exist, they are resolved in this order (highest priority first):
| Priority | Source |
|---|---|
| 1 | SWAMP_DATASTORE environment variable |
| 2 | CLI --datastore argument |
| 3 | .swamp.yaml datastore field |
| 4 | Default: filesystem at {repoDir}/.swamp/ |
SWAMP_DATASTORE Environment Variable
Overrides the datastore configuration for a single invocation. Format:
SWAMP_DATASTORE=<type>:<value>| Format | Example |
|---|---|
| Filesystem | SWAMP_DATASTORE=filesystem:/path/to/dir |
| Extension (JSON) | SWAMP_DATASTORE=@swamp/s3-datastore:{"bucket":"my-bucket","region":"us-east-1"} |
| Legacy S3 | SWAMP_DATASTORE=s3:my-bucket/prefix |
The legacy s3:bucket/prefix format is auto-remapped to @swamp/s3-datastore
with the bucket and optional prefix extracted.
Environment variables within the value are expanded (e.g.,
filesystem:$HOME/swamp-data).
$ SWAMP_DATASTORE="filesystem:/tmp/override" swamp datastore status --json{
"type": "filesystem",
"path": "/tmp/override",
"healthy": true,
"message": "Filesystem datastore at /tmp/override is healthy",
"latencyMs": 0.36,
"directories": [
"definitions-evaluated",
"workflows-evaluated",
"data",
"outputs",
"workflow-runs",
"secrets",
"bundles",
"vault-bundles",
"driver-bundles",
"report-bundles",
"audit",
"telemetry",
"logs",
"files"
]
}Sync Behavior
Sync applies only to extension-provided datastores that implement a sync service. Filesystem datastores do not sync.
Write commands follow this lifecycle:
- Pull — download changed files from the remote backend to the local cache.
- Execute — run the command against the local cache.
- Push — upload changed files from the local cache to the remote backend.
Read-only commands skip sync entirely.
Change detection compares file size and modification time against a remote index. Transfer concurrency is capped at 10 concurrent file operations.
Cache path
Extension backends use a local cache directory for reads and writes. The sync
service transfers data between this cache and the remote backend. Filesystem
datastores access the configured path directly and have no separate cache.
Manual sync
$ swamp datastore syncTriggers a full pull-then-push cycle. Only available for sync-capable extension datastores. Filesystem datastores return an error:
Datastore sync is only available for sync-capable custom datastores.
Current datastore type: filesystemDistributed Locking
Write commands acquire a distributed lock before syncing and executing. The lock prevents concurrent writers from corrupting the datastore.
Lock metadata
| Field | Type | Description |
|---|---|---|
holder |
string | user@hostname |
hostname |
string | Machine name |
pid |
number | Process ID of the lock holder |
acquiredAt |
string (ISO 8601) | When the lock was acquired or last renewed |
ttlMs |
number | Lock duration in milliseconds before considered stale |
nonce |
string (optional) | UUID fencing token for this acquisition |
Lock parameters
| Parameter | Default | Description |
|---|---|---|
ttlMs |
30,000 (30s) | Lock lifetime before considered stale |
retryIntervalMs |
1,000 (1s) | Retry interval when lock is held |
maxWaitMs |
60,000 (60s) | Maximum wait before giving up |
Heartbeat
The lock holder renews the lock every ttlMs / 3 milliseconds (10 seconds at
the default TTL). Each renewal writes a fresh acquiredAt timestamp.
Stale lock detection
A lock is considered stale when either:
- The holder process is dead (checked via OS signal).
- The TTL has expired (
acquiredAt + ttlMs < now).
Stale locks are automatically reclaimed by the next writer.
Nonce fencing
Each lock acquisition generates a unique UUID nonce. Heartbeat renewals verify the on-disk nonce matches the held nonce. If a mismatch is detected (another process reclaimed the lock), the holder self-revokes.
SIGINT handling
When a process receives SIGINT (Ctrl-C) during a locked operation, a best-effort lock release runs before exit.
Per-model locks
In addition to the global datastore lock, individual model operations acquire per-model locks scoped to the model type and ID. Both lock types use the same mechanism.
CLI Commands
All datastore commands accept the standard global options (--json, --log,
--log-level, -q, -v, --no-telemetry, --no-color, --show-properties).
swamp datastore status
Show datastore configuration and health.
| Option | Description |
|---|---|
--repo-dir |
Repository directory (default .) |
$ swamp datastore statusDatastore Status
Type: filesystem
Path: /home/user/my-repo/.swamp
Health: ● healthy (0ms)
Dirs: definitions-evaluated, workflows-evaluated, data, outputs, ...$ swamp datastore status --json{
"type": "filesystem",
"path": "/home/user/my-repo/.swamp",
"healthy": true,
"message": "Filesystem datastore at /home/user/my-repo/.swamp is healthy",
"latencyMs": 0.40,
"directories": [
"definitions-evaluated",
"workflows-evaluated",
"data",
"outputs",
"workflow-runs",
"secrets",
"bundles",
"vault-bundles",
"driver-bundles",
"report-bundles",
"audit",
"telemetry",
"logs",
"files"
]
}swamp datastore setup filesystem
Configure a filesystem datastore backend.
| Option | Description |
|---|---|
--path |
Absolute path for the datastore directory (required) |
--directories |
Subdirectories to store in the datastore (comma-separated) |
--skip-migration |
Skip migrating existing data from .swamp/ |
--repo-dir |
Repository directory (default .) |
$ swamp datastore setup filesystem --path /data/swamp-store --json{
"type": "filesystem",
"path": "/data/swamp-store",
"filesCopied": 3,
"bytesCopied": 94576,
"directoriesMigrated": [
"definitions-evaluated",
"workflows-evaluated",
"data",
"outputs",
"workflow-runs",
"secrets",
"vault-bundles",
"audit",
"telemetry"
],
"errors": []
}When --directories is specified, only those subdirectories are moved to the
external path. The rest remain in .swamp/.
$ swamp datastore setup filesystem \
--path /data/swamp-store \
--directories data,outputs,workflow-runs \
--json{
"type": "filesystem",
"path": "/data/swamp-store",
"filesCopied": 3,
"bytesCopied": 94576,
"directoriesMigrated": [
"data",
"outputs",
"workflow-runs"
],
"errors": []
}The resulting .swamp.yaml:
datastore:
type: filesystem
path: /data/swamp-store
directories:
- data
- outputs
- workflow-runsswamp datastore setup extension
Configure an extension-provided datastore backend.
| Option | Description |
|---|---|
<type> |
Extension type identifier (e.g., @swamp/s3-datastore) |
--config |
JSON config object for the extension (required) |
--skip-migration |
Skip migrating existing data from .swamp/ |
--repo-dir |
Repository directory (default .) |
$ swamp datastore setup extension @swamp/s3-datastore \
--config '{"bucket":"my-bucket","region":"us-east-1"}'swamp datastore sync
Manually sync the local cache with a remote datastore.
| Option | Description |
|---|---|
--pull |
Pull only — fetch remote data to local cache |
--push |
Push only — upload local cache to remote |
--repo-dir |
Repository directory (default .) |
Without --pull or --push, runs a full sync (pull then push).
Only available for sync-capable extension datastores. Filesystem datastores return an error.
$ swamp datastore sync --pull
$ swamp datastore sync --push
$ swamp datastore syncswamp datastore lock status
Show who holds the datastore lock.
| Option | Description |
|---|---|
--repo-dir |
Repository directory (default .) |
$ swamp datastore lock statusLock Status: no lock heldWhen a lock is held, the output includes the holder, hostname, PID, acquisition time, TTL, and nonce.
$ swamp datastore lock status --jsonReturns the lock metadata object, or null if no lock is held.
swamp datastore lock release
Force-release a stuck datastore lock. This is a breakglass operation for recovering from a process that died without releasing its lock and the automatic stale-lock detection has not yet reclaimed it.
| Option | Description |
|---|---|
--force |
Required to confirm the force release |
--model |
Release a specific model's lock (type/id format, e.g., aws-ec2/my-server) |
--repo-dir |
Repository directory (default .) |
Without --model, releases the global datastore lock. With --model, releases
the per-model lock for the specified model.
$ swamp datastore lock release --force --json{
"released": false,
"reason": "no lock held"
}$ swamp datastore lock release --force --model aws-ec2/my-serverSetup Pipeline
When swamp datastore setup runs, it follows this sequence:
- Validate — verify the target is accessible (writable directory or reachable remote).
- Migrate — copy existing data from
.swamp/to the new location (unless--skip-migration). - Verify — compare file counts between source and destination.
- Update — write the
datastorefield to.swamp.yaml. - Clean up — remove migrated subdirectories from
.swamp/.
Path Resolution
Each file operation resolves to either the local .swamp/ directory or the
configured datastore path:
- If the file's parent subdirectory is in the
directorieslist and does not match anexcludepattern, it goes to the datastore path. - Otherwise, it stays in local
.swamp/.
For extension backends with sync support, the "datastore path" is the local cache directory. The sync service handles transfer between the cache and the remote backend.
Related
- Repository Configuration —
datastorefield in.swamp.yaml - Extension Manifest — packaging datastore extensions
- Vaults — secrets stored in the
secretsdatastore subdirectory - Data Outputs — data stored in the
datadatastore subdirectory