Skip to main content
← Back to list
01Issue
BugShippedExtensions
Assigneesstack72

fast-path sidecar TOCTOU: post-op HEAD can record generation from a concurrent writer's push, masking their data on next sync

Opened by stack72 · 4/24/2026· Shipped 4/25/2026

Summary

Both S3CacheSyncService and GcsCacheSyncService have a time-of-check / time-of-use race in the post-operation sidecar update that can cause the next pullChanged fast-path to silently skip a concurrent writer's data. Self-healing on the next remote write, so data is delayed not lost — but the window is real. Flagged by the Adversarial Code Review bot on PR systeminit/swamp-extensions#106 at medium severity; agreed it's inherited from the S3 #105 design and deferred to this follow-up.

Affected code

  • datastore/s3/extensions/datastores/_lib/s3_cache_sync.ts — the if (pulled === 0) branch at the end of pullChanged that HEADs the index and calls markSynced(head.etag). Same pattern exists in the no-writeback branch of pushChanged.
  • datastore/gcs/extensions/datastores/_lib/gcs_cache_sync.ts — equivalent sites, same shape, generation instead of ETag.

Reproduction

  1. Operator A runs swamp datastore syncpullChanged slow-path. pullIndex GETs the remote index at generation/ETag G1. Walk completes with 0 files to pull.
  2. Operator B pushes a new file, bumping remote index to G2.
  3. A's pullChanged reaches the end-of-path block. It HEADs the remote index, sees G2, records G2 in the sidecar via markSynced.
  4. A's next sync: tryFastPullChanged HEADs remote, sees G2. Sidecar also says G2. Generation match → returns 0. Fast path skipped the walk.
  5. B's new file is never pulled to A's cache. Remains invisible until any subsequent remote mutation bumps generation past G2.

Mitigating factors

  • Inherited from S3 #105, not a regression unique to lab/166. Pre-existing as of shipped @swamp/s3-datastore@2026.04.24.2.
  • Data is not lost — it exists remotely. A's view is stale.
  • Self-healing: the next push by A (or anyone) invalidates the sidecar via generation bump and the next pullChanged slow-paths.
  • Concurrent pull + push operations are less common than sequential ones in typical workflows. Single-operator setups never trigger it.

Proposed fix

Capture the index generation / ETag from the getObject response used by pullIndex, and markSynced with THAT value — the one we actually compared against — instead of re-HEADing post-walk. This closes the race window because the generation recorded is the one we verified against, not a newer one that arrived during the walk.

Requires getObject (or a new getObjectWithMetadata) to return the generation/ETag alongside the body. GCS's JSON API already returns generation in the response headers; the S3 SDK returns ETag on GetObjectCommandOutput.ETag.

Alternative: do a pre-pull HEAD, capture its generation, and only record that generation at end-of-path if the post-pull HEAD still matches it. Slightly more network chatter but no SDK / API changes.

Why not block lab/166 on this

The race is pre-existing, self-healing, and the operational shape (concurrent writers) is uncommon in the lab/164 reporter's workflow. Adversarial review was PASS with this as a Medium follow-up. Filed now so it doesn't get forgotten.

Scope

Both @swamp/s3-datastore and @swamp/gcs-datastore. Filed against @swamp/s3-datastore since that's where the pattern originated (#105) and where the bug is already in production; GCS inherits the same fix shape.

Upstream repository: https://github.com/systeminit/swamp-extensions

Environment

  • Extension: @swamp/s3-datastore@2026.04.24.3
  • swamp: 20260424.194520.0-sha.bd275a89
  • OS: darwin (aarch64)
  • Deno: 2.7.13
  • Shell: /bin/zsh
02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 1 MOREASSIGNED+ 5 MOREREVIEW+ 3 MOREPR_MERGEDSHIPPED

Shipped

4/25/2026, 12:02:03 AM

Click a lifecycle step above to view its details.

03Sludge Pulse
stack72 assigned stack724/24/2026, 10:58:18 PM

Sign in to post a ripple.