fast-path sidecar TOCTOU: post-op HEAD can record generation from a concurrent writer's push, masking their data on next sync
Opened by stack72 · 4/24/2026· Shipped 4/25/2026
Summary
Both S3CacheSyncService and GcsCacheSyncService have a time-of-check / time-of-use race in the post-operation sidecar update that can cause the next pullChanged fast-path to silently skip a concurrent writer's data. Self-healing on the next remote write, so data is delayed not lost — but the window is real. Flagged by the Adversarial Code Review bot on PR systeminit/swamp-extensions#106 at medium severity; agreed it's inherited from the S3 #105 design and deferred to this follow-up.
Affected code
datastore/s3/extensions/datastores/_lib/s3_cache_sync.ts— theif (pulled === 0)branch at the end ofpullChangedthat HEADs the index and callsmarkSynced(head.etag). Same pattern exists in the no-writeback branch ofpushChanged.datastore/gcs/extensions/datastores/_lib/gcs_cache_sync.ts— equivalent sites, same shape, generation instead of ETag.
Reproduction
- Operator A runs
swamp datastore sync→pullChangedslow-path.pullIndexGETs the remote index at generation/ETag G1. Walk completes with 0 files to pull. - Operator B pushes a new file, bumping remote index to G2.
- A's
pullChangedreaches the end-of-path block. It HEADs the remote index, sees G2, records G2 in the sidecar viamarkSynced. - A's next sync:
tryFastPullChangedHEADs remote, sees G2. Sidecar also says G2. Generation match → returns 0. Fast path skipped the walk. - B's new file is never pulled to A's cache. Remains invisible until any subsequent remote mutation bumps generation past G2.
Mitigating factors
- Inherited from S3 #105, not a regression unique to lab/166. Pre-existing as of shipped
@swamp/s3-datastore@2026.04.24.2. - Data is not lost — it exists remotely. A's view is stale.
- Self-healing: the next push by A (or anyone) invalidates the sidecar via generation bump and the next
pullChangedslow-paths. - Concurrent pull + push operations are less common than sequential ones in typical workflows. Single-operator setups never trigger it.
Proposed fix
Capture the index generation / ETag from the getObject response used by pullIndex, and markSynced with THAT value — the one we actually compared against — instead of re-HEADing post-walk. This closes the race window because the generation recorded is the one we verified against, not a newer one that arrived during the walk.
Requires getObject (or a new getObjectWithMetadata) to return the generation/ETag alongside the body. GCS's JSON API already returns generation in the response headers; the S3 SDK returns ETag on GetObjectCommandOutput.ETag.
Alternative: do a pre-pull HEAD, capture its generation, and only record that generation at end-of-path if the post-pull HEAD still matches it. Slightly more network chatter but no SDK / API changes.
Why not block lab/166 on this
The race is pre-existing, self-healing, and the operational shape (concurrent writers) is uncommon in the lab/164 reporter's workflow. Adversarial review was PASS with this as a Medium follow-up. Filed now so it doesn't get forgotten.
Scope
Both @swamp/s3-datastore and @swamp/gcs-datastore. Filed against @swamp/s3-datastore since that's where the pattern originated (#105) and where the bug is already in production; GCS inherits the same fix shape.
Upstream repository: https://github.com/systeminit/swamp-extensions
Environment
- Extension:
@swamp/s3-datastore@2026.04.24.3 - swamp:
20260424.194520.0-sha.bd275a89 - OS:
darwin(aarch64) - Deno:
2.7.13 - Shell:
/bin/zsh
Shipped
Click a lifecycle step above to view its details.
Sign in to post a ripple.