Fan-out to Multiple Sinks

Fan-out lets a single pipeline deliver the same event stream to multiple destinations simultaneously. A common pattern: replicate to BigQuery for analytics and Kafka for event-driven services — one source read, two sink writes.

How to configure fan-out

Use sinks: (plural list) instead of sink: (singular object) in the pipeline definition. When sinks is present, sink is ignored.

pipelines:
  - name: orders-fan-out
    source:
      connection: prod-postgres
      tables: [public.orders]
    sinks:
      - connection: prod-bigquery
      - connection: prod-kafka

Each entry in the list is a full sink definition — it can reference a named connection and carry its own properties and target_mappings.

Full example — Postgres to BigQuery and stdout

A practical starting point: replicate to BigQuery while also streaming to stdout so you can watch events in real time during development.

connections:
  - name: prod-postgres
    type: postgres
    dsn: "postgres://replicator:${env:PG_PASSWORD}@db.prod:5432/mydb?sslmode=require"

  - name: prod-bigquery
    type: bigquery
    properties:
      project_id:       my-project
      dataset_id:       replication
      credentials_file: /path/to/key.json

  - name: local-out
    type: stdout

pipelines:
  - name: orders-fan-out
    replication_type: cdc_backfill
    source:
      connection: prod-postgres
      tables: [public.orders]
    sinks:
      - connection: prod-bigquery
        properties:
          table_id: orders
      - connection: local-out

Apply with:

nanosync apply --file pipelines.yaml

How fan-out works internally

Nanosync writes to all sinks in parallel from a single decoded event stream. The source is read once; events are broadcast to each sink’s write worker simultaneously.

Progress tracking is per-sink. Each sink maintains its own LSN watermark. The checkpoint written to the state store after each batch represents the minimum LSN across all sinks — meaning the pipeline only advances its position when every sink has successfully committed.

If one sink fails, that sink’s worker pauses and retries. The other sinks continue writing. Once the failing sink recovers, it catches up from its last committed LSN before the checkpoint advances again. No events are lost and no events are duplicated.

Per-sink properties

Each entry in sinks can carry its own properties and target_mappings block, completely independent of the others:

sinks:
  - connection: prod-bigquery
    properties:
      table_id: orders_raw
    target_mappings:
      - source: "public.orders"
        name: orders_raw
        include_by_default: true

  - connection: prod-kafka
    properties:
      topic: orders-events
    target_mappings:
      - source: "public.orders"
        include_by_default: false
        fields:
          - source: id
          - source: status
          - source: updated_at

  - connection: local-out

This lets you send full rows to BigQuery, a trimmed projection to Kafka, and the raw stream to stdout — all from the same source read.

Adding and removing sinks

Update sinks in your pipeline YAML and re-apply:

nanosync apply --file pipelines.yaml

Adding a sink: nanosync starts writing to it from the current pipeline position. It does not replay historical events to the new sink. If you need the new sink to receive historical data, use a separate cdc_backfill pipeline pointing at the same source.

Removing a sink: nanosync stops writing to it. The connection definition can stay in the connections block — it’s only used if referenced by a pipeline.

The pipeline’s replication lag is determined by the slowest sink. If one sink consistently falls behind — due to throughput limits, network latency, or errors — it becomes the bottleneck for the entire pipeline’s checkpoint advance. Monitor per-sink lag with nanosync monitor.