Add a preflight diagnostic for AI-tool audit integrations (so upstream CLI changes stop breaking us silently)
Opened by stack72 · 4/23/2026· Shipped 4/24/2026
Problem
swamp init --tool <tool> wires up audit hooks for external AI CLIs (Kiro, Claude Code, Cursor, OpenCode), but there is currently no way to verify that the integration actually works. When the external tool changes its contract, the integration fails silently — no error at init, no error at runtime, the audit log just stays empty.
Concrete example: today we debugged why the swamp audit hook wasn't firing for a fresh Kiro CLI install, and found three independent silent failures compounding:
.kiro/agents/swamp.jsonships with"tools": ["*"]. kiro-cli 2.0 rejects wildcards in the agent schema and silently falls back to the built-inkiro_defaultagent. The swamp agent — and itspostToolUsehook — never load. (src/domain/repo/repo_service.ts:1442-1451)No workspace-level
chat.defaultAgentis written. Even if the swamp agent loaded, plainkiro-cli chatuses the default agent. Users would have to know to pass--agent swampevery time.The Kiro normalizer hard-codes
tool_name === "execute_bash"(src/domain/audit/hook_input.ts:157). kiro-cli 2.0 emits"shell"at runtime in itspostToolUsepayload. Every event is silently dropped by the normalizer.
The swamp audit feature was written in early March 2026 (PR #572, #584) against kiro-cli 1.x. Kiro cut a 2.0 around 2026-04-13 and the runtime tool name changed. Nothing broke loudly — the audit directory just stayed empty, and users have no signal unless they go digging in the source.
This class of breakage will keep happening every time an upstream AI CLI ships a schema change. We need a preflight.
Proposed solution
Add a diagnostic subcommand (name TBD — see bikeshed below) that exercises the full audit integration for the tool configured in .swamp.yaml and reports per-check pass/fail with actionable hints.
Proposed checks (each fails independently and is reported separately — don't short-circuit on first failure):
- Verify the AI tool's binary is on
PATH(e.g.which kiro-cli). - Validate that the agent config swamp generated (e.g.
.kiro/agents/swamp.json) is loadable by the current version of the tool — shape is accepted, every tool name in the list is recognised. - Verify the workspace-level default-agent setting exists and points at the swamp agent (e.g.
.kiro/settings/cli.jsonhaschat.defaultAgent: swamp). - Smoke-test the recording pipeline: generate a synthetic
postToolUsepayload matching what the tool currently emits at runtime, pipe it throughswamp audit record --from-hook --tool <tool>, and confirm a new row lands in.swamp/audit/commands-*.jsonl. This is the check that would have caught theshellvsexecute_bashregression. - Output
--logand--jsonmodes like every other swamp command. JSON mode should be machine-readable so it can gate CI.
Actionable hints matter: a failing check should tell the user exactly what to run or edit to fix it (e.g. "agent config has tools: ['*'] which kiro-cli 2.0 rejects — run swamp init --tool kiro --force to regenerate").
Alternatives considered
- Do nothing; rely on bug reports. Users who hit this silently assume audit isn't working and stop using it. We'd never find out.
- Only fix the three concrete bugs. Those fixes will ship separately. But the root cause is "no way to verify the integration still works after an upstream change" — a diagnostic is the durable fix.
- Add an integration test that runs real kiro-cli in CI. Worth doing too, but it's a CI-only signal and won't help a user on a freshly-installed tool version that hasn't been pinned in CI yet.
Name bikeshed — please NOT doctor
swamp doctor is the obvious name but isn't swampy enough. Candidates to pick from (or propose others):
swamp sniff— sniff around for what's brokenswamp wade— wade through the setupswamp dredge— dredge up issuesswamp survey— survey the swampswamp probe— probe the integration
Happy to let the team pick — just please not doctor.
Out of scope / follow-ups
The three concrete Kiro bugs that motivated this issue will be filed and fixed separately; this issue is specifically about the preflight mechanism. This diagnostic should be extensible to Claude Code, Cursor, and OpenCode as well — the per-tool normalizers live in src/domain/audit/hook_input.ts, so each tool can expose its own "expected runtime payload shape" to feed the smoke-test.
Shipped
Click a lifecycle step above to view its details.
Sign in to post a ripple.