paths.base: manifest is not honored for workflows: — bundled workflows only resolve from repo root, blocking self-contained subdir layouts (sibling to #459)

A workflow that can be discovered and validated from the CLI fails when fired by swamp serve scheduled execution. The scheduled run appears to execute with only built-in or otherwise incomplete registries loaded, even though a fresh CLI process in the same repository can see the required pulled model and vault extension types.

This report intentionally uses generic placeholder names and omits concrete extension names, hostnames, IPs, and domain-specific workflow details.

Environment

Server command: swamp serve --repo-dir <repo> --host 0.0.0.0 --port 9090
Reproduced after updating server binary and repository metadata to: 20260527.181855.0-sha.2efdbeea
Repository contains:
- pulled model extensions, e.g. <pulled-model-type>
- a local custom model extension, e.g. <local-model-type>
- a pulled custom vault extension, e.g. <custom-vault-type>
- vault config using the pulled vault type
- workflows whose steps call those model types and whose model definitions use vault.get(...)

What works from the CLI

From the same repo directory, fresh CLI commands see the extension catalog correctly:

swamp vault type search --json
# includes local_encryption and <custom-vault-type>

swamp model type search <query> --json
# includes <pulled-model-type> and <local-model-type>

swamp doctor extensions --json
# overallStatus: pass
# relevant pulled/local model sources: Indexed
# relevant vault source: Indexed

The workflow validates from the CLI:

swamp workflow validate <workflow-name> --json
# passed: true

A manually triggered workflow run from the same repo had also succeeded before the scheduler-specific test.

Reproduction

Start swamp serve with --repo-dir <repo>.
Confirm the process is running the updated binary.
Add a temporary cron trigger to an existing read-only workflow, scheduled for the next minute.
Confirm /health shows scheduling enabled and the schedule registered with the expected nextRun.
Wait for the scheduler to fire.

The scheduler fires as expected:

Scheduled workflow "<workflow-name>" ("<cron>")
Registered schedule for workflow "<workflow-name>": "<cron>"
Running scheduled workflow "<workflow-name>"
Firing scheduled run for workflow "<workflow-name>"

Actual result

The scheduled run fails immediately before executing the first method. In the latest retest on 20260527.181855.0-sha.2efdbeea, the run failed in about 19 ms with:

Unknown model type: <pulled-model-type>

A previous scheduled run against another workflow in the same repo failed similarly with:

Unknown model type: <local-model-type>

The service journal from that previous run also showed the vault registry was incomplete during scheduled execution:

Unsupported vault type: <custom-vault-type> (vault "<vault-name>"). Available vault types: local_encryption

This is the key contrast: a fresh CLI process in the same repo sees the pulled/local model types and custom vault type, while the scheduled execution path inside the long-running swamp serve process behaves as if the repo extension registries are not loaded.

Expected result

Scheduled workflow execution should use the same repo extension registry/catalog resolution as manual CLI workflow execution and WebSocket-triggered execution. If swamp model type search, swamp vault type search, and swamp doctor extensions can see the extension types from the repo, swamp serve scheduled runs should be able to resolve those same types.

Cleanup after reproduction

The temporary cron trigger was removed after the scheduled run, and /health returned schedules: [].

Notes from source inspection

In the inspected source, swamp serve creates ScheduledExecutionService, and scheduled runs call executeWorkflowWithLocks(...). That path appears intended to call registry loading through workflow run dependency construction, but the observed runtime behavior suggests one of these is happening in the serve scheduler process:

registry loaders are not configured for the serve command process, or
ensureLoaded() returns without loading the repo extension catalog, or
scheduled workflow execution constructs part of its runtime through a path that bypasses the configured repo extension loaders.

The repo layout itself does not appear incomplete: pulled extension lock data and pulled files exist, and swamp doctor extensions reports the relevant entries as indexed.

Privacy note

Concrete extension names, workflow names, hostnames, IPs, and domain-specific model names have been replaced with placeholders in this report.

02Bog Flow

Shipped

5/29/2026, 1:11:11 AM

Click a lifecycle step above to view its details.

03Sludge Pulse

stack72 assigned stack725/27/2026, 7:53:03 PM

stack72 commented 5/27/2026, 8:53:13 PM

@phy2vir thanks for the detailed report — the structured reproduction steps and the 19ms timing observation were particularly helpful for narrowing this down.

We have triaged this and done a thorough code walkthrough of the extension loading architecture plus multiple reproduction attempts.

How serve extension loading is designed to work:

When you run swamp serve --repo-dir <repo>, the process goes through runCli() which pre-parses --repo-dir and calls configureExtensionLoaders() before the serve command action runs. This sets up lazy loader closures on the global registry singletons (modelRegistry, vaultTypeRegistry, driverTypeRegistry, reportRegistry) — the same path every CLI command uses. The serve command is not excluded from this setup.

When a schedule fires, the path is: ScheduledExecutionService.executeWorkflow → executeWorkflowWithLocks → createWorkflowRunDeps (in src/serve/deps.ts), which calls ensureLoaded() on all four registries before constructing the workflow deps. This triggers the lazy loaders, which read the extension catalog DB, enumerate pulled extension directories from the lockfile, and register types (either fully or as lazy catalog entries for per-bundle loading). Then WorkflowExecutionService.executeStep calls resolveModelType(), which uses ensureTypeLoaded() for on-demand bundle import, followed by the auto-resolver as a fallback.

The WebSocket-triggered execution path goes through the same executeWorkflowWithLocks function — there is no separate code path for scheduled vs WebSocket execution.

Where swamp serve should run from:

swamp serve --repo-dir <path> should work from any working directory. All internal paths — the extension catalog, lockfile, pulled extension sources, bundles, model definitions, vault configs, and workflows — are resolved as absolute paths from the --repo-dir value. The process CWD is not used for any extension or repo lookups.

We verified this explicitly: starting swamp serve --repo-dir /tmp/test-repo from / (root), ~ (home), and the repo directory itself all produced identical behavior — extensions loaded, scheduled workflows executed.

If you are running serve via systemd or another service manager where the working directory is / or /root, that should be fine as long as --repo-dir points to the repo root (the directory containing .swamp.yaml).

Filesystem layout the loaders expect:

The extension loaders resolve all paths from --repo-dir. Here is what they expect:

<repo-dir>/                                  ← --repo-dir points here
├── .swamp.yaml                              # repo marker
├── .swamp/
│   ├── _extension_catalog.db                # SQLite catalog — indexes all extension sources
│   ├── bundles/<hash>/                      # pre-compiled JS bundles (from pull)
│   │   ├── user_pool.js
│   │   └── ...
│   └── pulled-extensions/
│       └── @<collective>/<extension>/
│           ├── manifest.yaml
│           ├── models/*.ts                  # pulled model sources
│           ├── vaults/*.ts                  # pulled vault sources
│           └── ...
├── extensions/
│   └── models/                              # default modelsDir
│       ├── upstream_extensions.json         # lockfile — maps pulled extensions to dirs
│       └── <local-model>.ts                 # local custom model extensions live here
├── models/                                  # model instance definitions (YAML)
│   └── @<collective>/<type>/<uuid>.yaml
├── vaults/                                  # vault configurations
│   └── <vault-name>.yaml
└── workflows/                               # workflow definitions
    └── workflow-<uuid>.yaml

Key paths:

Models dir: defaults to extensions/models, overridable via SWAMP_MODELS_DIR env or modelsDir in .swamp.yaml
Lockfile: <modelsDir>/upstream_extensions.json
Extension catalog: .swamp/_extension_catalog.db — rebuilt on demand if missing or stale
Bundles: .swamp/bundles/<hash>/ — pre-compiled JS from extension pull

Reproduction attempts — all passed:

Dev mode (deno run dev serve) with a pulled extension (@swamp/gcp/cloudshell), scheduled workflow firing every minute — extensions loaded correctly.
Compiled binary with @swamp/aws/cognito pulled, model + scheduled workflow — first and second scheduled runs both completed. First run took ~470ms (catalog rebuild + bundle imports), second was instant (registries cached).
Compiled binary from CWD / with --repo-dir /tmp/... — extensions loaded identically to running from the repo directory.
Compiled binary from CWD ~ with deleted extension catalog (_extension_catalog.db* removed) — catalog rebuilt from disk on first scheduled execution, bundles loaded from cache, run completed.
CLI validation before each test — swamp model type search, swamp workflow validate, and swamp doctor extensions all confirmed extensions indexed and healthy.

What the 19ms timing tells us:

In our successful tests, the first scheduled run takes 400-500ms because ensureLoaded() triggers the lazy loaders. The 19ms from your report means ensureLoaded() returned almost instantly — either the registries were already marked as loaded (but empty), or the loaders ran and failed silently. The loader functions have catch {} blocks that swallow all errors, so if anything goes wrong, the registries end up empty but permanently marked as "loaded."

To help us reproduce, could you share:

Does the repo use .swamp-sources.yaml (additional extension source directories)?
Is the datastore filesystem-based or custom (e.g., S3)?
Is the --repo-dir path a symlink or does it cross any mount boundaries?
Does .swamp.yaml have a modelsDir or workflowsDir override?
If you can reproduce again, could you run with --log-level debug? The debug logs should show whether the extension loaders actually ran and what they found.

We are planning a defensive fix regardless — adding diagnostic logging to the silent catch blocks and eager extension loading at serve startup so failures surface immediately rather than getting swallowed.

phy2vir commented 5/28/2026, 7:08:44 PM

Follow-up after the requested environment checks, debug repro, and a scheduler-run restore test. I am sanitizing concrete package/workflow/host names here.

Answers to the environment questions:

No .swamp-sources.yaml is present, and swamp extension source list reports no additional sources.
Datastore is filesystem-backed, rooted at <repo>/.swamp; swamp datastore status --json reports healthy.
--repo-dir points at a real directory, not a symlink. <repo>, <repo>/.swamp, <repo>/extensions, <repo>/workflows, and <repo>/vaults are all on the same ext4 mount.
.swamp.yaml has no modelsDir or workflowsDir overrides.
systemd unit uses WorkingDirectory=<repo> and ExecStart=/usr/local/bin/swamp serve --repo-dir <repo> --host 0.0.0.0 --port 9090.

One important environment difference: the systemd service initially had NO_COLOR=1, PATH=..., and USER=root, but no HOME or USERPROFILE.

On startup, swamp serve logged these warnings:

Failed to load user datastore extensions: "Cannot determine home directory (HOME/USERPROFILE not set)"
Failed to load user model extensions: "Cannot determine home directory (HOME/USERPROFILE not set)"
Failed to load user vault extensions: "Cannot determine home directory (HOME/USERPROFILE not set)"
Failed to load user driver extensions: "Cannot determine home directory (HOME/USERPROFILE not set)"
Failed to load user report extensions: "Cannot determine home directory (HOME/USERPROFILE not set)"

With debug enabled and no HOME, a temporary scheduled workflow reproduced the issue:

scheduled run failed in 21ms
first step failed with Unknown model type for a pulled model type
logs also showed the pulled vault type was unsupported, with only local_encryption available
debug logs showed auto-resolution being attempted/skipped for the relevant collectives, rather than the already-pulled repo bundles being loaded

Then I added a systemd drop-in:

[Service]
Environment=HOME=/root

After restarting swamp serve, the startup loader warnings disappeared. Running the same scheduled workflow through the scheduler then succeeded in about 28s.

I then tested the actual restore workflow through swamp serve scheduling, not via manual CLI execution. With HOME=/root still set, that scheduled restore run also succeeded end-to-end in about 249s. It produced the expected restore-test data artifact, reported success: true, confirmed the temporary VM was network-isolated and guest-agent reachable, and cleanup completed successfully.

To answer the likely deployment question: this was a normal systemd wrapper around the documented/built-in swamp serve --repo-dir <repo> command. I do not see a swamp serve install / swamp service install style command in the current CLI that would have generated a unit with HOME set automatically. In hindsight, setting HOME or XDG_CONFIG_HOME explicitly in the service unit is a good hardening step, and it is now the local workaround.

So this looks reproducible when swamp serve runs under a service manager without HOME/USERPROFILE: startup user-extension loading fails, and after that the serve process appears to have incomplete registries for pulled repo extensions during scheduled execution. Setting HOME avoids the failure in this environment and fixes both a small scheduled repro workflow and the real scheduled restore workflow.

Thanks again for digging into this. Hopefully this narrows the repro surface: service-managed swamp serve, valid --repo-dir, filesystem datastore, no source overrides, no directory overrides, but missing HOME/USERPROFILE in the process environment.

stack72 commented 5/29/2026, 12:54:59 AM

Thanks for the excellent reproduction — that nailed it. 🙏

Your follow-up isolated the root cause precisely: the systemd unit ran swamp serve with no HOME/USERPROFILE in its environment. swamp loads every extension — including already-pulled repo bundles — through an embedded runtime that lives under ~/.swamp, so home-directory resolution threw deep inside each loader at startup. That cascaded into the misleading Failed to load user X extensions warnings and, at scheduled-run time, Unknown model type for your pulled model/vault types. Your Environment=HOME=/root drop-in is exactly the right fix, and it's the recommended workaround.

We've shipped a follow-up in #1470 that closes the diagnosability gap so nobody else has to reverse-engineer this:

A guard clause in the extension-loader setup now detects the missing-home condition once, up front, and emits a single actionable warning that names the real cause and the fix (set HOME, e.g. Environment=HOME=/root) — replacing the five confusing per-kind Failed to load user … extensions warnings.
swamp serve --help now documents the HOME/USERPROFILE requirement for service-managed deployments.

Note this is intentionally a diagnosability + docs fix: it doesn't make headless serve load extensions without a home directory (the embedded runtime genuinely needs one) — it just tells you so clearly instead of failing obscurely. Setting HOME (or XDG-appropriate equivalents) in the service unit remains the correct configuration, as you found.

Thanks again for the detailed, sanitized write-up and the restore-workflow verification — it made this a quick fix.

Should be able to see all the issues I created by a filter "submitted by me"

Ability to change the email address associated with my Swamp Club Account

feat: giga-swamp phase 5 — CLI output + namespace management commands

CI review jobs use two-dot diff that includes files the PR never touched

paths.base: manifest is not honored for workflows: — bundled workflows only resolve from repo root, blocking self-contained subdir layouts (sibling to #459)

Lab profanity filter rejects legitimate CLI flag tokens via substring match

Sign and notarize the swamp macOS binary

Add platform type to issue-lifecycle extension model Zod schema

fix: datastoreSetupExtension() ignores namespace config on initial migration push/pull

Remote execution: orchestrator/worker fan-out (replaces execution drivers)

swamp datastore sync --push creates global .datastore-index.json ignoring namespace config

feat: S3/GCS extension namespace-scoped sync support

Copy explicitGlobalArgs before mutation in resolveOrCreateDefinition

vault.get() expressions in extension model globalArguments are not resolved at runtime

swamp-issue skill should scrub secrets and org-specific data before submission

workflow validate: trim stale 'skipped' label from model_not_found warning

Add pi coding agent support

hashicorp-vault should read token from env

swamp-extension adversarial review skill needs mandatory mechanical verification checklist

feat: giga-swamp phase 6 — Namespace-scoped sync

swamp workflow validate emits misleading "Extension failed to load" warning when type resolves locally

Add issue search/list command to discover existing issues

Support vault-resolved private key content in transport auth (not just file paths)

Workflow engine resolves extension methods against base type, ignoring extension-registered methods

Per-model LockTimeoutError at 60s causes cascading failures under concurrent access

Persistent, queryable workflow runs (status / cancel from any shell)

swamp repo upgrade: ERR_SQLITE_ERROR 'attempt to write a readonly database' during extension catalog schema migration

workflow validate: fail on references to unknown model instances (typo'd modelIdOrName)

feat: giga-swamp phase 4 — CEL cross-namespace queries

Docs: document the extension push adversarial-review gate

vault://local_encryption token does not round-trip correctly for GCP OAuth2 access tokens

swamp issue: add ability to edit issue title and body after submission

@swamp/gcp/iam: add WIF pool, provider, service account, and binding support

Support vault-sourced identity keys

copy method reports success when scp exits non-zero (e.g. 255)

Docs: TLS behind inspecting proxies / private CAs (system trust store, DENO_CERT, SSL_CERT_FILE)

Extension quality/adversarial-review: add a 'published-surface hygiene' check for real infra identifiers

Feed-post scoring is a direct domain write, not a consumer of feed_post_approved telemetry

workflow validate silently PASSES steps whose model type is a pulled extension (step-inputs skipped = false pass)

extension quality fails to resolve bare specifiers — contradicts fmt no-import-prefix rule

Allow global arguments in direct type execution (workflow fan-out)

Bundled Deno does not honor the OS/system CA trust store

Gator-approved feed post did not trigger Discord activity or profile points

username_metrics projection backfill does not trigger re-scoring (stale UserScore for dormant users)

Enforce adversarial review gate before extension push

support git forge / web namespaces for collectives

Report type filtering in report search

extension search: empty results from CLI despite known extensions

workflow approve/resume cannot find suspended runs

vault annotate --url fails with query params on @swamp/aws-sm

datastore compact VACUUM fails with ERR_SQLITE_ERROR

workflow approve/resume cannot find suspended run when using S3 datastore

reindexByUsername re-strands pre-association history and wipes sign_in_dates

Telemetry never retroactively credits a device's pre-association history

Docs: document swamp doctor secrets in manual reference doctor.md

Docs: document 'swamp workflow resume --input' in manual reference

Cloudflare codegen: manifest version bumps on every regeneration (README not deno-fmt-clean)

Support dynamic host discovery from external sources

feat: giga-swamp phase 3 — Path resolver + per-namespace locking

@swamp/ssh exec: string host selector only matches 'all', ignores host names and tags

Add integration test for sensitive-arg guard on lazily-loaded extension types (follow-up to #480)

Remediate existing definitions holding cleartext sensitive global arguments (follow-up to #480)

Docs: document refusal of literal sensitive global arguments (follow-up to #480)

Docs: update extension-trust reference for swamp-only default + lockfile version pinning (swamp-club#465)

feat: giga-swamp phase 2 — Catalog schema v4 + repository interface

Support for Custom CA's

Cloudflare: support vault expressions for API credentials instead of env-var-only auth

GCP: support vault expressions for credentials instead of env-var-only auth

AWS: support vault expressions for credentials instead of env-var/SDK-chain-only auth

DigitalOcean: support vault expressions for the API token instead of DO_API_TOKEN env var

swamp model get does not redact `sensitive: true` fields (logs/reports/storage do)

Support vault expressions for API token instead of env var

UAT tests for manual_approval workflow commands

Document manual_approval workflow step type and suspend/approve/resume flow

Stale extension bundles break after swamp upgrade

Support --input flags on workflow resume for elevated permissions and runtime overrides

Add HTTP approval endpoints to swamp serve for manual_approval steps

feat: giga-swamp phase 1 — Namespace value object + config

swamp serve scheduled workflows do not load repo extension registries

ci: aws-check and gcp-check jobs take ~30min — rethink whether full model type-checking is needed per PR