Giga-Swamp and Namespaces

Most swamp repositories start the same way: one team, one repo, one datastore. Models, workflows, data outputs, and secrets all live together in .swamp/. This is solo mode, and it works well when a single repository is the entire picture.

The picture changes when multiple repositories need to share a datastore. Two teams automating different parts of the same infrastructure — one managing networking, the other managing compute — want to reference each other's data without copying it. An S3 bucket or GCS bucket can already serve as a shared datastore backend. But when two repositories write to the same bucket, a question arises: who produced this data?

That question is what namespaces answer.

Provenance, not isolation

A namespace is a bounded context identifier stamped onto data at write time. It records which repository produced a piece of data — which team, which concern, which operational boundary. The namespace is a provenance label.

This is a deliberate design choice. Namespaces do not create isolated partitions. There are no access controls between namespaces, no permission boundaries, no visibility restrictions. Every namespace's data is visible to every other namespace in the same datastore. A data.query() call without a namespace predicate returns results from all namespaces.

The reason for this is that the value of a shared datastore comes from sharing. If two teams automate related infrastructure, they need to see each other's outputs — VPC IDs from the networking team, instance metadata from the compute team. Isolation would defeat the purpose. Namespaces solve the attribution problem (who wrote this?) without creating the access problem (who can read this?).

This makes namespaces different from multi-tenant partitioning in databases, Kubernetes namespaces, or cloud account boundaries. Those are security boundaries. Swamp namespaces are organizational labels.

Solo mode vs giga-swamp mode

A repository without a namespace is in solo mode. It reads and writes data at the top level of the datastore, with no namespace prefix in paths or data records. This is the default, and most repositories stay here permanently.

A repository becomes part of a giga-swamp when it is assigned a namespace. The namespace changes the physical layout of the datastore: data moves from .swamp/data/ to .swamp/<namespace>/data/, and each namespace occupies its own subtree. The namespace migrate command handles this reorganization.

The transition is reversible. A namespaced repository can return to solo mode with namespace unset, and namespace migrate --reverse moves data back to the top-level layout.

What is and isn't shared

When multiple repositories share a datastore, most data is shared and cross-queryable. Model definitions, data outputs, workflow runs, and telemetry all live in namespace-scoped subtrees that any repository can read.

Secrets and vault bundles are the exception. They remain repo-local regardless of namespace configuration. Sharing a datastore does not mean sharing credentials. A secret stored in one repository's vault is never visible to another repository, even if both write to the same S3 bucket.

This asymmetry is intentional. Operational data (what did this model produce?) is collaborative. Credentials (what can this team access?) are not.

Cross-namespace queries

CEL expressions support cross-namespace data access through a prefix syntax. data.latest("infra:my-vpc", "result") reads the latest result from the my-vpc model in the infra namespace. A wildcard data.latest("*:my-vpc", "result") searches all namespaces, failing with an ambiguity error if the model name exists in more than one.

For data.query(), the ns field is available as a predicate — filter results to a specific namespace or combine namespace predicates with other filters. The field name is ns, not namespace, because CEL reserves namespace as a keyword.

Without an explicit namespace prefix, point lookups (data.latest, data.version) scope to the current repository's own namespace. data.query() defaults to all namespaces. This means existing CEL expressions written in solo mode continue to work after namespace assignment — they implicitly scope to the local namespace for point lookups and see everything for queries.

The foreign catalog

Each repository maintains a catalog of its own models and data outputs. In a shared datastore, a repository can pull catalog metadata from other namespaces to discover what data is available without scanning the full datastore. The catalog pull --namespaces command fetches this metadata.

The foreign catalog is read-only and eventually consistent. It represents what other namespaces had at their last sync. There is no TTL or automatic refresh in v1 — pull explicitly when you need current information.

Known v1 limitations

Several features do not cross namespace boundaries in the current implementation:

Workflow steps cannot reference models in other namespaces. A workflow in the infra namespace can only run methods on infra models. Cross-namespace orchestration requires separate workflows coordinated externally.
Garbage collection is namespace-scoped. Each namespace manages its own data retention. There is no cross-namespace GC coordination.
Foreign catalog staleness is not automatically managed. The pulled catalog has no TTL and no background refresh. Stale metadata is possible if another namespace has changed since the last pull.

These limitations reflect a v1 scope decision: namespaces provide provenance and cross-namespace reads, but each namespace remains operationally independent. Cross-namespace writes and orchestration may follow in future versions.

Namespace How-to Guides — step-by-step guides for setting up and managing namespaces
Namespace Commands — CLI reference for swamp datastore namespace
CEL Expressions — cross-namespace query syntax
Datastore Configuration — namespace field in datastore config