Lab #30: datastore sync --push runs pushChanged() twice per invocation (coordinator dedup)

Summary

swamp datastore sync --push invokes S3CacheSyncService.pushChanged() twice on every invocation — once from the command action and again from flushDatastoreSync() at runCli teardown. Both walks scan the full cache and both rounds do S3 PUTs. This is pure waste on the success path, and on the failure path it produces misleading log ordering where "Pushed 1 file(s)" prints before the original error surfaces.

This is a separable follow-up to swamp-club issue #29. Issue #29 covers the primary bug (_catalog.db-wal fails because the catalog lives inside the sync cache) and has its own fix path. This issue is specifically about the coordinator/command-level push duplication and should be fixed independently because the risk profile is different — getting the signalling wrong can cause silent push no-ops, which is worse than the current double-push waste.

Code path

src/cli/commands/datastore_sync.ts:68-71 — --push mode uses requireInitializedRepo (not the read-only variant).
src/cli/repo_context.ts:325-329 — requireInitializedRepo calls registerDatastoreSync({ service, lock, label }) for S3 datastores, registering the S3CacheSyncService with the global sync coordinator.
src/cli/commands/datastore_sync.ts:76-79 — the command action then calls datastoreSync(ctx, deps, { mode: 'push' }).
src/libswamp/datastores/sync.ts:172-177 — datastoreSync with mode: 'push' calls deps.pushSync() which calls syncService.pushChanged(). First push.
src/cli/mod.ts:939 — after cli.parse returns, runCli calls flushDatastoreSync().
src/infrastructure/persistence/datastore_sync_coordinator.ts:196-247 — flushDatastoreSync() iterates all registered entries and calls entry.service.pushChanged(). Second push.
src/cli/mod.ts:991 — on error paths, the catch block calls flushDatastoreSync() again, compounding the confusion.

Evidence that the maintainer is already aware

src/cli/commands/datastore_sync.ts:61-62 explicitly comments:

``` // Pull-only mode uses read-only init to avoid acquiring the global // lock and triggering a coordinator push on flush. ```

…and uses `requireInitializedRepoReadOnly` for the pull path specifically to dodge this. The push path hasn't been similarly adjusted.

Impact

Every S3 `--push` invocation does 2× the S3 PUT cost and 2× the cache walks. Doubled latency, doubled API costs, doubled race surface on any mutable file in the cache (this is what makes the `_catalog.db-wal` failure in issue #29 so confusing — the first push hits the WAL mid-write, the second push runs after SQLite has checkpointed the WAL, so the logs show "Pushed 1 file(s)" success before the original error surfaces as fatal).
Log ordering is misleading on failures. The second push runs after the first has thrown, so the success message for the second push appears above the fatal error from the first. Anyone debugging a failed push will see conflicting evidence in the log trace.

Suggested fix approaches

Two viable directions, each with a failure mode to watch:

A. Command skips the explicit push, relies on coordinator flush. Remove the `deps.pushSync()` call from `datastoreSync({ mode: 'push' })` and let `flushDatastoreSync()` at `runCli` teardown do the push. This mirrors how normal commands (`model create`, `workflow run`) already work — the coordinator flush is the canonical push point. Risk: the coordinator flush at `datastore_sync_coordinator.ts:235-246` logs failures as `warn` rather than propagating them, so the CLI could exit 0 on a broken push. Any fix in this direction must also make flush errors propagate so the command exits non-zero on push failure.

B. Coordinator flush skips its push when an explicit push has already run. Add a flag on the coordinator entry (e.g. `pushedExplicitly: true`) that `datastoreSync` sets after a successful `pushSync()`, and that `flushDatastoreSyncNamed` checks before calling `pushChanged`. The lock still gets released on flush. Risk: state has to survive the error path — the catch at `src/cli/mod.ts:991` calls flush even when the explicit push threw, and in that case the flag is `false`, so flush would retry the push. That might actually be desirable on transient network errors, but it reintroduces the exact double-push the fix is trying to eliminate.

Approach A is cleaner architecturally — one push point, consistent with how other commands work. Approach B is more surgical but has more moving parts.

Testing requirements

Whatever approach is taken, regression tests must assert:

Exactly one `pushChanged` call on the success path of `swamp datastore sync --push`.
Exactly one `pushChanged` attempt on the failure path (no duplicate attempts after a throw).
The CLI exits non-zero when the push fails (current behavior only exits non-zero because the first push throws; if the first push is removed, the flush-time push must still produce a non-zero exit on failure).
`swamp datastore sync --pull` and full `swamp datastore sync` behavior is unchanged (the pull path already uses read-only init and shouldn't regress).

Integration tests should exercise this with a mock `DatastoreSyncService` that counts `pushChanged()` calls — `src/cli/repo_context_test.ts` already has the lock lifecycle harness.

Environment

swamp CLI (any recent version — this is structural, present since `requireInitializedRepo` started registering sync services with the coordinator)
Any `@swamp/s3-datastore`-backed repo
Affects all `swamp datastore sync --push` invocations, not just failing ones

swamp-club issue #29: "datastore sync --push fails on _catalog.db-wal: catalog SQLite DB lives inside the S3 sync cache" — the primary bug. This issue (double-push) is called out in that report as a "secondary bug" but scoped out of the main fix because the risk profile differs.

Triage note. Reproduced the double-push end-to-end using a counting SyncableService wired through the real coordinator and the real datastoreSync generator: exactly 2 pushChanged() calls per --push invocation. All the code paths in the original write-up check out, and the maintainer-awareness comment at datastore_sync.ts:61-62 plus the pull-side fix in #983 confirm this is a symmetric gap that was left open when the pull path was fixed.

One extra finding from the repro: registerDatastoreSync also calls pullChanged() at registration time whenever a service is registered (coordinator.ts:154-186), so --push today is actually pull → push → push, not push → push. That raises a question about localMtime drift — the exact failure mode #983's commit message called out for the pull path may be biting --push here too, though I haven't verified it bites in practice.

Leaning toward deferring this rather than fixing it now. Honest cost/benefit:

Real user impact today is low.

swamp datastore sync --push is a manual command, not a hot path — doubled PUT cost and latency are pennies and seconds
The misleading log ordering only matters during a failed push, which is rare
The _catalog.db-wal race surface goes away once #29 lands
No user reports of slowness, wrong data, or S3 bills attached to this

Fix cost is non-trivial for any of the three viable options:

Remove explicit push, let coordinator flush do it — cleanest in theory, but requires (a) making flushDatastoreSyncNamed propagate push failures instead of logging warn (structural change affecting every command that uses the coordinator), and (b) accepting that the DatastoreSyncEvent.completed event loses filesPushed (JSON output contract break). Wider blast radius than it looks.
Flag on coordinator entry (pushedExplicitly) — surgical, but introduces an error-path policy decision: on explicit-push throw, the runCli catch-block flush would either retry (reintroducing the double-push on every failure) or skip (masking a transient recovery opportunity). Either choice is a future argument.
New requireInitializedRepoForPush init variant — follows the exact pattern #983 used for --pull: acquires the lock but doesn't register a sync service with the coordinator. Narrowest change, no coordinator API churn, no JSON break, and as a side effect it also kills the unnecessary pull-on-register. Preferred shape if we do fix it.

Should be able to see all the issues I created by a filter "submitted by me"

Ability to change the email address associated with my Swamp Club Account

feat: giga-swamp phase 5 — CLI output + namespace management commands

CI review jobs use two-dot diff that includes files the PR never touched

paths.base: manifest is not honored for workflows: — bundled workflows only resolve from repo root, blocking self-contained subdir layouts (sibling to #459)

Lab profanity filter rejects legitimate CLI flag tokens via substring match

Sign and notarize the swamp macOS binary

Add platform type to issue-lifecycle extension model Zod schema

fix: datastoreSetupExtension() ignores namespace config on initial migration push/pull

Remote execution: orchestrator/worker fan-out (replaces execution drivers)

swamp datastore sync --push creates global .datastore-index.json ignoring namespace config

feat: S3/GCS extension namespace-scoped sync support

Copy explicitGlobalArgs before mutation in resolveOrCreateDefinition

vault.get() expressions in extension model globalArguments are not resolved at runtime

swamp-issue skill should scrub secrets and org-specific data before submission

workflow validate: trim stale 'skipped' label from model_not_found warning

Add pi coding agent support

hashicorp-vault should read token from env

swamp-extension adversarial review skill needs mandatory mechanical verification checklist

feat: giga-swamp phase 6 — Namespace-scoped sync

swamp workflow validate emits misleading "Extension failed to load" warning when type resolves locally

Add issue search/list command to discover existing issues

Support vault-resolved private key content in transport auth (not just file paths)

Workflow engine resolves extension methods against base type, ignoring extension-registered methods

Per-model LockTimeoutError at 60s causes cascading failures under concurrent access

Persistent, queryable workflow runs (status / cancel from any shell)

swamp repo upgrade: ERR_SQLITE_ERROR 'attempt to write a readonly database' during extension catalog schema migration

workflow validate: fail on references to unknown model instances (typo'd modelIdOrName)

feat: giga-swamp phase 4 — CEL cross-namespace queries

Docs: document the extension push adversarial-review gate

vault://local_encryption token does not round-trip correctly for GCP OAuth2 access tokens

swamp issue: add ability to edit issue title and body after submission

@swamp/gcp/iam: add WIF pool, provider, service account, and binding support

Support vault-sourced identity keys

copy method reports success when scp exits non-zero (e.g. 255)

Docs: TLS behind inspecting proxies / private CAs (system trust store, DENO_CERT, SSL_CERT_FILE)

Extension quality/adversarial-review: add a 'published-surface hygiene' check for real infra identifiers

Feed-post scoring is a direct domain write, not a consumer of feed_post_approved telemetry

workflow validate silently PASSES steps whose model type is a pulled extension (step-inputs skipped = false pass)

extension quality fails to resolve bare specifiers — contradicts fmt no-import-prefix rule

Allow global arguments in direct type execution (workflow fan-out)

Bundled Deno does not honor the OS/system CA trust store

Gator-approved feed post did not trigger Discord activity or profile points

username_metrics projection backfill does not trigger re-scoring (stale UserScore for dormant users)

Enforce adversarial review gate before extension push

support git forge / web namespaces for collectives

Report type filtering in report search

extension search: empty results from CLI despite known extensions

workflow approve/resume cannot find suspended runs

vault annotate --url fails with query params on @swamp/aws-sm

datastore compact VACUUM fails with ERR_SQLITE_ERROR

workflow approve/resume cannot find suspended run when using S3 datastore

reindexByUsername re-strands pre-association history and wipes sign_in_dates

Telemetry never retroactively credits a device's pre-association history

Docs: document swamp doctor secrets in manual reference doctor.md

Docs: document 'swamp workflow resume --input' in manual reference

Cloudflare codegen: manifest version bumps on every regeneration (README not deno-fmt-clean)

Support dynamic host discovery from external sources

feat: giga-swamp phase 3 — Path resolver + per-namespace locking

@swamp/ssh exec: string host selector only matches 'all', ignores host names and tags

Add integration test for sensitive-arg guard on lazily-loaded extension types (follow-up to #480)

Remediate existing definitions holding cleartext sensitive global arguments (follow-up to #480)

Docs: document refusal of literal sensitive global arguments (follow-up to #480)

Docs: update extension-trust reference for swamp-only default + lockfile version pinning (swamp-club#465)

feat: giga-swamp phase 2 — Catalog schema v4 + repository interface

Support for Custom CA's

Cloudflare: support vault expressions for API credentials instead of env-var-only auth

GCP: support vault expressions for credentials instead of env-var-only auth

AWS: support vault expressions for credentials instead of env-var/SDK-chain-only auth

DigitalOcean: support vault expressions for the API token instead of DO_API_TOKEN env var

swamp model get does not redact `sensitive: true` fields (logs/reports/storage do)

Support vault expressions for API token instead of env var

UAT tests for manual_approval workflow commands

Document manual_approval workflow step type and suspend/approve/resume flow

Stale extension bundles break after swamp upgrade

Support --input flags on workflow resume for elevated permissions and runtime overrides

Add HTTP approval endpoints to swamp serve for manual_approval steps

feat: giga-swamp phase 1 — Namespace value object + config

swamp serve scheduled workflows do not load repo extension registries

ci: aws-check and gcp-check jobs take ~30min — rethink whether full model type-checking is needed per PR