Pipelines
A pipeline pairs one source connection with one or more destination connections and manages the full replication lifecycle: snapshot → CDC → checkpoint → resume. Each pipeline is independent with its own LSN watermark, replication slot, and state.
Lifecycle
A pipeline moves through these states:
CREATED → SNAPSHOTTING → LIVE CDC → (PAUSED) → LIVE CDC
↓
ERRORED
- CREATED — pipeline has been applied but hasn’t started yet (rare; pipelines start immediately after apply).
- SNAPSHOTTING — initial full-table backfill in progress. Multiple goroutines read partitions in parallel.
- LIVE CDC — snapshot complete; streaming row-level changes from the source’s WAL or change tables.
- PAUSED — manually paused via
nanosync pipeline pause. Resumes from the last checkpoint. - ERRORED — the pipeline stopped due to an unrecoverable error. Check logs and
nanosync monitorfor the cause.
Creating a pipeline
Define connections and pipelines in a YAML file and apply it to the running server:
connections:
- name: prod-postgres
type: postgres
dsn: "postgres://nanosync:${env:PG_PASSWORD}@db.prod.internal:5432/mydb?sslmode=require"
- name: prod-bigquery
type: bigquery
properties:
project_id: my-project
dataset_id: replication
credentials_file: /etc/nanosync/bq-sa.json
pipelines:
- name: orders-pipeline
replication_type: cdc_backfill
source:
connection: prod-postgres
tables:
- public.orders
- public.order_items
sink:
connection: prod-bigquery
properties:
table_id: orders
nanosync apply --file pipelines.yaml
The pipeline starts immediately. The snapshot begins, then transitions to streaming CDC automatically.
apply is idempotent — run it again after any change to update the pipeline definition without data loss.
Multiple tables in one pipeline
List tables under tables:. All tables in a pipeline share the same snapshot worker pool and CDC stream, so they are always consistent with each other — events from different tables arrive in the same transaction order as they were committed in the source.
source:
connection: prod-postgres
tables:
- public.orders
- public.order_items
- public.products
There is no limit on the number of tables per pipeline. For unrelated table groups with different latency or consistency requirements, use separate pipelines.
Fan-out to multiple destinations
Use sinks: (plural) to write to multiple destinations simultaneously. Each sink writes the same event stream independently — if one sink is slow, it doesn’t block the others.
pipelines:
- name: orders-pipeline
source:
connection: prod-postgres
tables: [public.orders]
sinks:
- connection: prod-bigquery
- connection: prod-kafka
properties:
topic: orders-events
Each sink can have its own target_mappings to project or filter columns differently. See Configuration reference for details.
When sinks is present, the top-level sink field is ignored.
Managing pipelines
# List all pipelines and their current state
nanosync list pipelines
# Open the live terminal dashboard
nanosync monitor
# Pause a running pipeline (checkpoints first, then stops cleanly)
nanosync pipeline pause orders-pipeline
# Resume from the last checkpoint
nanosync pipeline resume orders-pipeline
# Delete a pipeline (stops it and removes all state, including the replication slot)
nanosync pipeline delete orders-pipeline
nanosync monitor shows status, lag, and events-per-second for every pipeline. Press Enter on a pipeline to see table-level breakdown. Press q to quit.
Checkpointing and resume
Nanosync writes the last committed LSN to the state store after every successful batch write to the sink. On restart — whether planned or due to a crash — the pipeline reads the checkpoint and resumes from that exact position.
No re-snapshot. No manual recovery. No data loss.
INF pipeline resuming name=orders-pipeline lsn=0/3A1B2C4D
INF cdc streaming lsn=0/3A1B2C4D
The checkpoint is durable — it’s written to either the embedded SQLite state store or an external Postgres state store, depending on your server config. If the checkpoint is lost (e.g. the state store was wiped), the pipeline starts a fresh snapshot from the beginning.
Replication types
| Type | Behaviour |
|---|---|
cdc_backfill | (default) Snapshot existing rows, then stream live CDC changes |
cdc | Stream live CDC changes only — skip the initial snapshot |
snapshot | One-off full table copy; pipeline stops when complete |
query | Run a scheduled SQL query on the source |
Use cdc when you’ve already seeded the destination and only need ongoing changes. Use snapshot for one-time migrations.
You can run many pipelines against the same source database simultaneously. Each pipeline gets its own replication slot (nanosync_slot_<pipeline-name>) and tracks its own LSN position independently. There is no coordination overhead between pipelines on the same source.