Skip to main content
← Back to list
01Issue
FeatureOpenSwamp CLI
AssigneesNone

#380 Datastore: lazy hydration for fast cold-start on first clone

Opened by stack72 · 5/19/2026

Problem

When a new team member clones a repo with an S3 datastore, or CI checks out a fresh copy, the first sync downloads the entire datastore contents — every model's data, outputs, workflow runs, audit logs, telemetry. If the repo has hundreds of models and gigabytes of data, this takes minutes even when the user only needs to run one model. The cold-start penalty makes onboarding slow and CI expensive.

Proposed Solution

Add opt-in lazy hydration: on first clone or `datastore setup`, download only index metadata. Hydrate individual model data on first access.

Framework additions

  • `lazyHydration?: boolean` on `SyncCapabilities` — extension advertises support
  • `hydrateScope?(models, options)` on `DatastoreSyncService` — optional method to hydrate specific models' data on demand
  • `hydrationStrategy?: "full" | "lazy"` on `CustomDatastoreConfig` — user opts in via `.swamp.yaml`, defaults to `"full"` (today's behavior)

S3 extension

  • `hydrateScope()`: read `_index/{partition}.json` (from #379's partitioned index), download entries missing locally
  • Lazy initial pull: when cache is empty and strategy is `"lazy"`, download only `_index/_meta.json`, create placeholder directories, return 0
  • Background hydration: after the command completes, download remaining partitions at low priority (non-blocking)

Core wiring

In `acquireModelLocks`, after scoped pull: if `caps?.lazyHydration`, call `hydrateScope()` to ensure the models being operated on are fully available locally before the command runs.

Backward Compatibility

  • `hydrationStrategy` defaults to `"full"` — no change for any existing user
  • `"lazy"` is opt-in via `.swamp.yaml`
  • `hydrateScope` is optional on the interface — old extensions unaffected
  • If `hydrateScope` fails → fall back to full `pullChanged()` (today's path)

Dependencies

Depends on #378 (framework contracts for `SyncCapabilities` and `capabilities()`) and #379 (partitioned index so the extension knows what data exists without downloading it all).

This is Phase 3 of a 3-phase datastore efficiency overhaul.

02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED

Open

5/19/2026, 8:58:42 PM

No activity in this phase yet.

03Sludge Pulse

Sign in to post a ripple.