Skip to main content
← Back to list
01Issue
FeatureOpenSwamp CLI
AssigneesNone

#325 Improve skill trigger routing for cross-model edge cases

Opened by stack72 · 5/11/2026

Context

PR #1364 restructured 16 swamp-* skills into 11, improving trigger eval pass rate from 198/202 (98.0%) to 262/264 (99.2%) on sonnet. Cross-model CI confirms all 4 models pass the 90% threshold:

  • Sonnet: 262/264 (99.2%)
  • GPT-5.4: 258/264 (97.7%)
  • Opus: 251/263 (95.4%)
  • Gemini 2.5 Pro: 245/263 (93.2%)

However, the cross-model analysis surfaced patterns worth addressing in a follow-up.

Cross-model failures to revisit

1. Report authoring API terms route to swamp-extension instead of swamp-report (3/4 models)

"How do I read execution data in a report using dataHandles?" fails on sonnet, opus, and gemini. The terms dataHandles, UnifiedDataRepository, and MethodReportContext pull strongly toward swamp-extension because the report creation content was moved there. swamp-report still has these in its reference files but the frontmatter description no longer claims them.

Options: Either accept that authoring queries route to swamp-extension (and remove from swamp-report evals), or add authoring-specific triggers back to swamp-report's description.

2. "manifest.yaml to push" splits 50/50 across models

Sonnet/gemini route to swamp-extension-publish; GPT/opus route to swamp-extension. This query genuinely spans both skills — manifest creation is authoring, but "push" signals publishing intent.

Options: Remove this eval from both skills (it's legitimately ambiguous), or strengthen the publish description to claim manifest creation when paired with push intent.

3. Opus over-indexes on swamp-extension for debugging queries

Opus routes "type error in custom model code", "module not found", and "extension crashing" to swamp-extension instead of swamp-troubleshooting (5 failures). The "Do NOT use for debugging" negative trigger in swamp-extension's description is not strong enough for opus.

Options: Strengthen the negative trigger wording, or add "type error", "module not found", "crashing" as explicit triggers in swamp-troubleshooting.

4. "workflow erroring on second step" routes to swamp-workflow (2/4 models)

GPT and opus route this to swamp-workflow instead of swamp-troubleshooting despite "erroring" being in troubleshooting's triggers. The word "workflow" dominates.

Options: Add a stronger negative trigger to swamp-workflow for error/debugging queries, or accept that workflow-context debugging naturally lands in the workflow skill first.

5. Gemini text-only responses (no tool call)

Gemini 2.5 Pro fails several evals by responding with text instead of making a tool call. This appears to be a systemic issue with the promptfoo system prompt rather than a skill description problem.

Options: Tune the system prompt in generate_config.ts for gemini specifically, or accept the lower gemini baseline.

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED

Open

5/11/2026, 11:13:47 PM

No activity in this phase yet.

03Sludge Pulse

Sign in to post a ripple.