Pipelines

A pipeline pairs one source connection with one or more destination connections and manages the full replication lifecycle: snapshot → CDC → checkpoint → resume. Each pipeline is independent with its own LSN watermark, replication slot, and state.


Lifecycle

A pipeline moves through these states:

CREATED → SNAPSHOTTING → LIVE CDC → (PAUSED) → LIVE CDC

                          ERRORED

Creating a pipeline

Define connections and pipelines in a YAML file and apply it to the running server:

connections:
  - name: prod-postgres
    type: postgres
    dsn: "postgres://nanosync:${env:PG_PASSWORD}@db.prod.internal:5432/mydb?sslmode=require"

  - name: prod-bigquery
    type: bigquery
    properties:
      project_id: my-project
      dataset_id: replication
      credentials_file: /etc/nanosync/bq-sa.json

pipelines:
  - name: orders-pipeline
    replication_type: cdc_backfill
    source:
      connection: prod-postgres
      tables:
        - public.orders
        - public.order_items
    sink:
      connection: prod-bigquery
      properties:
        table_id: orders
nanosync apply --file pipelines.yaml

The pipeline starts immediately. The snapshot begins, then transitions to streaming CDC automatically.

apply is idempotent — run it again after any change to update the pipeline definition without data loss.


Multiple tables in one pipeline

List tables under tables:. All tables in a pipeline share the same snapshot worker pool and CDC stream, so they are always consistent with each other — events from different tables arrive in the same transaction order as they were committed in the source.

source:
  connection: prod-postgres
  tables:
    - public.orders
    - public.order_items
    - public.products

There is no limit on the number of tables per pipeline. For unrelated table groups with different latency or consistency requirements, use separate pipelines.


Fan-out to multiple destinations

Use sinks: (plural) to write to multiple destinations simultaneously. Each sink writes the same event stream independently — if one sink is slow, it doesn’t block the others.

pipelines:
  - name: orders-pipeline
    source:
      connection: prod-postgres
      tables: [public.orders]
    sinks:
      - connection: prod-bigquery
      - connection: prod-kafka
        properties:
          topic: orders-events

Each sink can have its own target_mappings to project or filter columns differently. See Configuration reference for details.

When sinks is present, the top-level sink field is ignored.


Managing pipelines

# List all pipelines and their current state
nanosync list pipelines

# Open the live terminal dashboard
nanosync monitor

# Pause a running pipeline (checkpoints first, then stops cleanly)
nanosync pipeline pause orders-pipeline

# Resume from the last checkpoint
nanosync pipeline resume orders-pipeline

# Delete a pipeline (stops it and removes all state, including the replication slot)
nanosync pipeline delete orders-pipeline

nanosync monitor shows status, lag, and events-per-second for every pipeline. Press Enter on a pipeline to see table-level breakdown. Press q to quit.


Checkpointing and resume

Nanosync writes the last committed LSN to the state store after every successful batch write to the sink. On restart — whether planned or due to a crash — the pipeline reads the checkpoint and resumes from that exact position.

No re-snapshot. No manual recovery. No data loss.

INF pipeline resuming  name=orders-pipeline  lsn=0/3A1B2C4D
INF cdc streaming      lsn=0/3A1B2C4D

The checkpoint is durable — it’s written to either the embedded SQLite state store or an external Postgres state store, depending on your server config. If the checkpoint is lost (e.g. the state store was wiped), the pipeline starts a fresh snapshot from the beginning.


Replication types

TypeBehaviour
cdc_backfill(default) Snapshot existing rows, then stream live CDC changes
cdcStream live CDC changes only — skip the initial snapshot
snapshotOne-off full table copy; pipeline stops when complete
queryRun a scheduled SQL query on the source

Use cdc when you’ve already seeded the destination and only need ongoing changes. Use snapshot for one-time migrations.


You can run many pipelines against the same source database simultaneously. Each pipeline gets its own replication slot (nanosync_slot_<pipeline-name>) and tracks its own LSN position independently. There is no coordination overhead between pipelines on the same source.