paths.base: manifest is not honored for workflows: — bundled workflows only resolve from repo root, blocking self-contained subdir layouts (sibling to #459)

Multi-model skill trigger evals reveal systematic failure patterns affecting skill routing accuracy across Claude Sonnet, Claude Opus, GPT-5.4, and Gemini 2.5 Pro. All models pass the 90% threshold but there are clear skill description issues to fix.

CI results (2026-04-10): https://github.com/systeminit/swamp/actions/runs/24239470178

Model	Pass Rate	Passed	Failed
Sonnet	99.0%	200/202	2
GPT-5.4	98.0%	198/202	4
Opus	94.1%	190/202	12
Gemini 2.5 Pro	91.6%	185/202	17

Cross-Model Failures (highest priority)

extension-model should NOT trigger for "Can I run this custom model as part of a scheduled workflow?" - fails on 3/4 models (sonnet, gpt-5.4, opus). Description too broad.
report SHOULD trigger for "What methods does UnifiedDataRepository expose in reports?" - fails on 3/4 models (sonnet, opus, gemini). Routed to troubleshooting instead.
extension-driver should NOT trigger for "Run this workflow step on a remote Kubernetes cluster" - fails on 3/4 models (gpt-5.4, opus, gemini). Description too broad.
model should NOT trigger for "How do I chain this model into an automated workflow?" - fails on 2/4 (gpt-5.4, opus).
workflow should NOT trigger for "The workflow is erroring on the second step" - fails on 2/4 (opus, gemini).

Pattern 1: swamp-report poorly differentiated

3/4 models affected. Gemini has 7 report failures alone. Reports consistently routed to troubleshooting. Fix: Update .claude/skills/swamp-report/SKILL.md to mention report creation, output formats, dataRepository, UnifiedDataRepository, dataHandles. Differentiate from troubleshooting.

Pattern 2: Extension descriptions too broad

3/4 models affected each. extension-model and extension-driver trigger on usage queries not about creating extensions. Fix: Update .claude/skills/swamp-extension-model/SKILL.md and swamp-extension-driver/SKILL.md to emphasize creating new TypeScript extensions. Add exclusions.

Pattern 3: Text responses instead of tool calls

2/4 models affected (Opus: 8 cases, Gemini: 10 cases). Both respond conversationally instead of routing via tool call. Fix: Strengthen system prompt in evals/promptfoo/generate_config.ts (~line 242) or adjust expectations per model.

Pattern 4: Ambiguous test cases

Some test cases may need reclassification in trigger_evals.json files. Review cases that fail on 2-3 models where routing is genuinely ambiguous.

Priority

Fix report SKILL.md description
Fix extension-model and extension-driver SKILL.md descriptions
Review ambiguous test cases
Address text response handling for Opus and Gemini

Reproduction

Set ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY then run eval-skill-triggers for each model. See scripts/analyze_eval_results.ts for cross-model analysis.

02Bog Flow

Shipped

4/14/2026, 9:37:56 PM

Click a lifecycle step above to view its details.

03Sludge Pulse

stack72 assigned stack724/14/2026, 8:13:26 PM

Should be able to see all the issues I created by a filter "submitted by me"

Ability to change the email address associated with my Swamp Club Account

feat: giga-swamp phase 5 — CLI output + namespace management commands

CI review jobs use two-dot diff that includes files the PR never touched

paths.base: manifest is not honored for workflows: — bundled workflows only resolve from repo root, blocking self-contained subdir layouts (sibling to #459)

Lab profanity filter rejects legitimate CLI flag tokens via substring match

Sign and notarize the swamp macOS binary

Add platform type to issue-lifecycle extension model Zod schema

fix: datastoreSetupExtension() ignores namespace config on initial migration push/pull

Remote execution: orchestrator/worker fan-out (replaces execution drivers)

swamp datastore sync --push creates global .datastore-index.json ignoring namespace config

feat: S3/GCS extension namespace-scoped sync support

Copy explicitGlobalArgs before mutation in resolveOrCreateDefinition

vault.get() expressions in extension model globalArguments are not resolved at runtime

swamp-issue skill should scrub secrets and org-specific data before submission

workflow validate: trim stale 'skipped' label from model_not_found warning

Add pi coding agent support

hashicorp-vault should read token from env

swamp-extension adversarial review skill needs mandatory mechanical verification checklist

feat: giga-swamp phase 6 — Namespace-scoped sync

swamp workflow validate emits misleading "Extension failed to load" warning when type resolves locally

Add issue search/list command to discover existing issues

Support vault-resolved private key content in transport auth (not just file paths)

Workflow engine resolves extension methods against base type, ignoring extension-registered methods

Per-model LockTimeoutError at 60s causes cascading failures under concurrent access

Persistent, queryable workflow runs (status / cancel from any shell)

swamp repo upgrade: ERR_SQLITE_ERROR 'attempt to write a readonly database' during extension catalog schema migration

workflow validate: fail on references to unknown model instances (typo'd modelIdOrName)

feat: giga-swamp phase 4 — CEL cross-namespace queries

Docs: document the extension push adversarial-review gate

vault://local_encryption token does not round-trip correctly for GCP OAuth2 access tokens

swamp issue: add ability to edit issue title and body after submission

@swamp/gcp/iam: add WIF pool, provider, service account, and binding support

Support vault-sourced identity keys

copy method reports success when scp exits non-zero (e.g. 255)

Docs: TLS behind inspecting proxies / private CAs (system trust store, DENO_CERT, SSL_CERT_FILE)

Extension quality/adversarial-review: add a 'published-surface hygiene' check for real infra identifiers

Feed-post scoring is a direct domain write, not a consumer of feed_post_approved telemetry

workflow validate silently PASSES steps whose model type is a pulled extension (step-inputs skipped = false pass)

extension quality fails to resolve bare specifiers — contradicts fmt no-import-prefix rule

Allow global arguments in direct type execution (workflow fan-out)

Bundled Deno does not honor the OS/system CA trust store

Gator-approved feed post did not trigger Discord activity or profile points

username_metrics projection backfill does not trigger re-scoring (stale UserScore for dormant users)

Enforce adversarial review gate before extension push

support git forge / web namespaces for collectives

Report type filtering in report search

extension search: empty results from CLI despite known extensions

workflow approve/resume cannot find suspended runs

vault annotate --url fails with query params on @swamp/aws-sm

datastore compact VACUUM fails with ERR_SQLITE_ERROR

workflow approve/resume cannot find suspended run when using S3 datastore

reindexByUsername re-strands pre-association history and wipes sign_in_dates

Telemetry never retroactively credits a device's pre-association history

Docs: document swamp doctor secrets in manual reference doctor.md

Docs: document 'swamp workflow resume --input' in manual reference

Cloudflare codegen: manifest version bumps on every regeneration (README not deno-fmt-clean)

Support dynamic host discovery from external sources

feat: giga-swamp phase 3 — Path resolver + per-namespace locking

@swamp/ssh exec: string host selector only matches 'all', ignores host names and tags

Add integration test for sensitive-arg guard on lazily-loaded extension types (follow-up to #480)

Remediate existing definitions holding cleartext sensitive global arguments (follow-up to #480)

Docs: document refusal of literal sensitive global arguments (follow-up to #480)

Docs: update extension-trust reference for swamp-only default + lockfile version pinning (swamp-club#465)

feat: giga-swamp phase 2 — Catalog schema v4 + repository interface

Support for Custom CA's

Cloudflare: support vault expressions for API credentials instead of env-var-only auth

GCP: support vault expressions for credentials instead of env-var-only auth

AWS: support vault expressions for credentials instead of env-var/SDK-chain-only auth

DigitalOcean: support vault expressions for the API token instead of DO_API_TOKEN env var

swamp model get does not redact `sensitive: true` fields (logs/reports/storage do)

Support vault expressions for API token instead of env var

UAT tests for manual_approval workflow commands

Document manual_approval workflow step type and suspend/approve/resume flow

Stale extension bundles break after swamp upgrade

Support --input flags on workflow resume for elevated permissions and runtime overrides

Add HTTP approval endpoints to swamp serve for manual_approval steps

feat: giga-swamp phase 1 — Namespace value object + config

swamp serve scheduled workflows do not load repo extension registries

ci: aws-check and gcp-check jobs take ~30min — rethink whether full model type-checking is needed per PR