Skip to main content
← Back to list
01Issue
FeatureShippedExtensions
Assigneesstack72

Relationships

#434 GCS datastore: dirty sidecar, partitioned index, content hashing, and scoped sync

Opened by stack72 · 5/24/2026· Shipped 5/24/2026

Problem

The @swamp/gcs-datastore extension has the same efficiency problems that the S3 extension had before its Phase 2 overhaul (#379):

  1. Push walks everythingmarkDirty() ignores relPath and flips a single localDirty: boolean. Every push walks all subdirectories.
  2. Pull fetches the full index — monolithic .datastore-index.json fetched and parsed on every sync.
  3. No content hashing — change detection uses stat.size + stat.mtime, unreliable across machines.
  4. Low transfer concurrency — likely hardcoded at 10 concurrent operations.
  5. No scoped sync — doesn't advertise scopedSync capability, so core can't pass context.models for per-model pull/push.

Proposed Solution

Apply the same changes that were shipped for @swamp/s3-datastore, adapted for GCS APIs. The S3 extension is the reference implementation.

A. Update local interfaces

Add Phase 1 core types to the extension's local interface definitions: SyncContext, SyncCapabilities, context? on DatastoreSyncOptions, relPath? on DatastoreSyncOptions, capabilities?() on DatastoreSyncService.

B. Persistent dirty sidecar (efficient push)

  • Upgrade DatastoreSyncState to version 2 with dirtyPaths: string[] and bulkInvalidated: boolean
  • markDirty(options?) accepts relPath, tracks per-path dirty set with 200-path cap
  • pushChanged() walks only dirty directories when !bulkInvalidated && dirtyPaths.size > 0
  • Dirty paths can be directories OR files — walk directories, check files directly
  • Full walk remains as fallback for bulk invalidation
  • v1 sidecars read correctly (no dirtyPaths → full walk). v2 read by old client → version mismatch → full walk

C. Partitioned index with dual-write (efficient pull)

  • Write BOTH monolithic .datastore-index.json AND partitioned _index/{partition-key}.json files
  • Monolithic written FIRST for crash safety
  • Partition key format: data--{modelType-segments}--{modelId} (using -- separator)
  • Scoped pull reads partition file when context.models provided, falls back to monolithic when missing
  • _index/ excluded from isInternalCacheFile()

D. Content hashing (SHA-256)

  • Add optional sha256 field to index entries
  • Three-branch change detection: size differs → push; same size + same mtime → skip; same size + different mtime → compare SHA-256
  • Old entries without sha256 fall back to size comparison

E. Configurable concurrency

  • Add pullConcurrency and pushConcurrency to config schema (defaults: 50 pull, 25 push)

F. Advertise scopedSync

  • Implement capabilities() returning { scopedSync: true }
  • Phase 1 wiring in acquireModelLocks activates automatically

Reference Implementation

The @swamp/s3-datastore extension has all of these changes shipped and tested. Use it as the reference:

  • Sidecar v2 schema and markDirty logic
  • fileNeedsPush three-branch detection
  • groupEntriesByPartition and partitionKeyFromModel for partition key derivation
  • writePartitionedIndex and pullPartitionedIndex for dual-write and scoped read
  • isInternalCacheFile exclusion for _index/

Backward Compatibility

  • v1 sidecars trigger full walk (today's behavior)
  • Old clients read monolithic index (always written)
  • sha256 ignored by old readers (JSON forward compat)
  • capabilities() is inert until core passes context.models

Dependencies

Requires swamp core with Phase 1 framework contracts (#378) and the markDirty hook fix (#429) — both are already merged.

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 1 MOREASSIGNED+ 2 MOREREVIEW+ 3 MOREPR_MERGEDSHIPPED

Shipped

5/24/2026, 9:09:38 PM

Click a lifecycle step above to view its details.

03Sludge Pulse
stack72 assigned stack725/24/2026, 8:14:30 PM

Sign in to post a ripple.