Configuration Reference

Nanosync uses two separate YAML files with distinct purposes:

FileUsed withContains
Server confignanosync start --config server.yamlDatabase, secrets backend, notifications, logging
Resource confignanosync apply --file pipelines.yamlConnections and pipeline definitions

This separation means your pipeline definitions are never entangled with server-level credentials. You can restart the server without touching pipeline state, and update pipelines without restarting the server.


Server config

The server config file is passed to nanosync start dev or nanosync start server. It controls only the daemon process — not pipelines.

# ── Secrets backend ───────────────────────────────────────────────────────────
# Enables ${secret:...} expansion across all string fields.
# Options: env (default), gcp, aws, vault
secrets_backend: gcp

# For HashiCorp Vault:
# secrets_backend: vault
# vault_addr:      https://vault.example.com
# vault_role_id:   ${env:VAULT_ROLE_ID}
# vault_secret_id: ${env:VAULT_SECRET_ID}

# ── Database (state store) ────────────────────────────────────────────────────
database:
  type: sqlite          # sqlite (default) or postgres
  data_dir: /var/lib/nanosync   # SQLite file location; ignored when dsn is set
  # dsn: ${secret:nanosync/store-dsn}   # explicit path (SQLite) or connection string (Postgres)

# ── Notifications ─────────────────────────────────────────────────────────────
# Global channels referenced by alerts[].channels in pipeline definitions.
notifications:
  slack:
    webhook_url: ${env:SLACK_WEBHOOK_URL}
  pagerduty:
    routing_key: ${env:PAGERDUTY_ROUTING_KEY}
  webhooks:
    - name: ops-webhook
      url: https://hooks.example.com/nanosync
      headers:
        Authorization: "Bearer ${env:WEBHOOK_TOKEN}"
      timeout: 5s

# ── Logging ───────────────────────────────────────────────────────────────────
logging:
  level: info                    # debug | info | warn | error
  dir: /var/log/nanosync         # omit to log to stdout only
  otel_endpoint: ""              # OTLP/HTTP endpoint for trace export
  cloud_logging_project: ""      # GCP project for Cloud Logging export

Server config fields

FieldTypeDescription
secrets_backendstringSecret provider: env (default), gcp, aws, vault
vault_addrstringVault server URL (when secrets_backend: vault)
vault_role_idstringVault AppRole role ID
vault_secret_idstringVault AppRole secret ID
database.typestringState store driver: sqlite (default) or postgres
database.data_dirstringDirectory for the SQLite file. Ignored when dsn is set.
database.dsnstringExplicit data source name. For SQLite: a file path. For Postgres: a connection string.
notifications.slack.webhook_urlstringIncoming webhook URL for Slack alerts
notifications.pagerduty.routing_keystringPagerDuty Events API v2 integration key
notifications.webhookslistNamed generic webhook destinations
logging.levelstringMinimum log level: debug, info, warn, error
logging.dirstringDirectory for rotating log files. Omit to write to stdout.
logging.otel_endpointstringOTLP/HTTP endpoint for the OTel logs bridge
logging.cloud_logging_projectstringGCP project for Cloud Logging export (ADC credentials required)

Resource config (nanosync apply)

The resource config is applied to a running server with nanosync apply --file pipelines.yaml. It declares connections and pipelines that are upserted into the state store.

# Top-level structure
connections:
  - ...   # named, reusable connection definitions

pipelines:
  - ...   # pipeline definitions that reference connections by name

apply is idempotent — it creates or updates all listed resources, and removes any server-side resources that are no longer in the file.

nanosync apply --file pipelines.yaml            # apply changes
nanosync apply --file pipelines.yaml --dry-run  # validate without changing anything

Connections

Named connections let you define credentials once and reference them across many pipelines. Connections are the authority for connector type and credentials — always define them in connections: rather than inline in each pipeline.

connections:
  - name: prod-postgres          # unique identifier referenced by pipelines
    type: postgres
    dsn: "postgres://user:${env:PG_PASSWORD}@db.prod:5432/mydb?sslmode=require"

  - name: prod-bigquery
    type: bigquery
    properties:
      project_id: my-gcp-project
      dataset_id: replication
      credentials_file: /etc/nanosync/bq-sa.json
FieldTypeRequiredDescription
namestringyesUnique identifier referenced in pipeline source.connection and sink.connection fields
typestringyesConnector type. Sources: postgres, sqlserver, kafka, local, s3, gcs, stdin. Sinks: bigquery, spanner, alloydb, cloudsql, kafka, pubsub, local, s3, gcs, iceberg, stdout.
dsnstringnoConnection string (used by database connectors)
propertiesmapnoKey-value connector properties (connector-specific)

Secret and environment variable expansion

Any string value in the YAML supports variable expansion:

dsn: "postgres://user:${env:PG_PASSWORD}@host:5432/db"
properties:
  api_key: "${secret:my-project/bq-api-key}"  # fetched from the secrets backend
TokenResolution
${env:VAR_NAME}Process environment variable
${secret:path}Secret from the configured secrets_backend (GCP, AWS, Vault)

Pipelines

pipelines:
  - name: orders-to-bigquery      # unique pipeline name, required
    replication_type: cdc_backfill  # cdc_backfill | cdc | snapshot | query

    source:
      connection: prod-postgres   # references a named connection
      tables:
        - public.orders
        - public.order_items
      properties:
        replication_slot: nanosync_orders
        chunk_size:       "10000"

    sink:
      connection: prod-bigquery
      properties:
        table_id: orders

    rate_limit:
      max_events_per_second: 10000
      max_bytes_per_second:  104857600   # 100 MB/s

    alerts:
      - event: pipeline_error
        channels: [slack]
      - event: lag_high
        threshold_seconds: 60
        for: 2m
        channels: [slack, pagerduty]

Pipeline fields

FieldTypeRequiredDescription
namestringyesUnique pipeline identifier
sourceobjectyesSource connector config
sinkobjectyes*Single-sink configuration. Use sinks for fan-out.
sinkslistyes*Multi-sink fan-out list. When set, sink is ignored.
replication_typestringnocdc_backfill (default), cdc, snapshot, or query
rate_limitobjectnoPer-pipeline throughput limits
alertslistnoAlert rules referencing channels from the server notifications config
transformslistnoOrdered list of WASM transform plugins applied to each batch
batchobjectnoBatcher and channel-size tuning
snapshotobjectnoParallel snapshot coordinator tuning
queryobjectnoRequired when replication_type: query; sets the cron schedule

*One of sink or sinks is required.

Replication types

TypeBehaviour
cdc_backfill(default) Snapshot existing rows, then stream live CDC changes
cdcStream live CDC changes only — no initial snapshot
snapshotOne-off full table copy; pipeline stops when complete
queryRun a scheduled SQL query on the source (requires query.schedule)

Source fields

FieldTypeDescription
connectionstringName of a named connection (recommended)
typestringConnector type (required if no connection)
dsnstringConnection string (overrides named connection on conflict)
tableslistTables to replicate, in schema.table format
fileslistFile paths or glob patterns for file-based sources (local, s3, gcs). Mutually exclusive with tables.
propertiesmapConnector-specific options (see connector docs)
include_schemaslistReplicate only tables in these schemas
exclude_schemaslistSkip all tables in these schemas
exclude_tableslistSkip specific tables (cannot overlap with tables)
exclude_columnsmapPer-table column exclusion: {"public.orders": ["pii_field"]}

Sink fields

FieldTypeDescription
namestringOptional human-readable label for logs and metrics
connectionstringName of a named connection (recommended)
typestringConnector type (required if no connection)
dsnstringConnection string (overrides named connection on conflict)
propertiesmapConnector-specific options (see connector docs)
target_mappingslistColumn projection and routing rules (see below). When empty, all events pass through unchanged.

Target mappings

target_mappings controls which source tables a sink receives and how columns are projected. Each entry’s source glob is matched against the event’s table name in declaration order.

sink:
  connection: prod-bigquery
  target_mappings:
    - source: "public.orders"       # glob matched against source table name
      name: bq_orders               # destination table name ({table} and {schema} tokens are supported)
      include_by_default: true       # pass all columns through unless excluded
      exclude_fields:
        - internal_flag
        - debug_col
      fields:
        - source: amount_cents       # source column name
          name: amount               # rename at destination (optional)
          expression: "float64(value)/100.0"  # transform expression (optional)
          type: float64              # Arrow output type hint (optional)

    - source: "public.*"
      name: "{table}_cdc"
      include_by_default: false      # allowlist mode — only emit columns in fields[]
      fields:
        - source: id
        - source: updated_at
FieldTypeDescription
sourcestringGlob pattern matched against schema.table (path.Match syntax). Required.
namestringDestination table, topic, or path prefix. Supports {table} and {schema} tokens.
include_by_defaultboolWhen true, all source columns pass through except those in exclude_fields. When false, only columns listed in fields are emitted.
exclude_fieldslistColumns to drop. Only valid when include_by_default: true.
fieldslistPer-column projection, rename, and transform rules

Field mapping

FieldTypeDescription
sourcestringSource column name. Required.
namestringDestination column name. Defaults to source when omitted.
expressionstringTransform expression using the built-in vocabulary (e.g. float64(value)/100.0).
typestringArrow output type declaration when it cannot be inferred from the expression.

Multi-sink fan-out

Send one pipeline’s events to multiple sinks simultaneously. Each sink can have independent target_mappings to filter or project differently.

pipelines:
  - name: orders-fan-out
    source:
      connection: prod-postgres
      tables: [public.orders]

    sinks:
      - connection: prod-bigquery
        target_mappings:
          - source: "public.orders"
            name: orders_raw
            include_by_default: true

      - connection: prod-kafka
        properties:
          topic: orders-events
        target_mappings:
          - source: "public.orders"
            include_by_default: false
            fields:
              - source: id
              - source: status
              - source: updated_at

When sinks is non-empty, the top-level sink field is ignored.


Rate limiting

rate_limit:
  max_events_per_second: 10000    # 0 = unlimited
  max_bytes_per_second: 104857600 # 0 = unlimited (100 MB/s shown)

Rate limits apply per pipeline and are enforced via backpressure on the source read side.


Alerts

Alert rules reference notification channels defined in the server config’s notifications block.

alerts:
  - event: pipeline_error
    channels: [slack, pagerduty]

  - event: lag_high
    threshold_seconds: 60    # alert when lag exceeds 60 seconds
    for: 2m                  # sustained for at least 2 minutes before firing
    channels: [slack]

  - event: schema_drift
    channels: [ops-webhook]  # named webhook from notifications.webhooks
FieldTypeDescription
eventstringTrigger: pipeline_error, lag_high, or schema_drift
channelslistNotification channel names: slack, pagerduty, or a named webhook
threshold_secondsnumberLag threshold in seconds. Only used with lag_high.
forstringDuration the threshold must be sustained before firing (e.g. "5m"). Only used with lag_high. Default: "0s" (immediate).

Batch tuning

Override the default micro-batcher behavior. Omit this block to use built-in defaults.

batch:
  sink_batch_size: 1000       # events per flush (default: 1000)
  event_ch_cap:    256        # per-partition event channel capacity (default: 256)
  sink_ch_cap:     8          # per-partition sink-batch channel capacity (default: 8)
  cdc_max_age:     100ms      # max time a partial CDC batch is held before flushing
  max_batch_bytes: 0          # byte ceiling per batch; 0 = use sink's built-in hint

Snapshot tuning

Override the parallel snapshot coordinator defaults. Only applies to cdc_backfill and snapshot replication types.

snapshot:
  concurrency:              4        # parallel partition workers (default: max(4, GOMAXPROCS))
  target_rows_per_partition: 100000  # desired rows per partition
  target_bytes_per_partition: 2147483648  # desired bytes per partition (2 GiB)
  max_partitions_per_table:  512     # cap on partitions per table

Query pipelines

Scheduled SQL query pipelines require replication_type: query and a query.schedule:

pipelines:
  - name: daily-summary
    replication_type: query
    source:
      connection: prod-postgres
      properties:
        query: "SELECT date_trunc('day', created_at) AS day, SUM(amount) FROM orders GROUP BY 1"
    sink:
      connection: prod-bigquery
      properties:
        table_id: daily_summary
    query:
      schedule: "@daily"   # cron expression or @alias (@hourly, @daily, @every 5m)

WASM transforms

Apply compiled WASM plugins to every batch in the pipeline before it reaches the sink. Plugins are applied in order.

transforms:
  - type: wasm
    path: /etc/nanosync/plugins/redact-pii.wasm
    config:
      fields: "email,phone"
      strategy: hash

File format sinks

When using local, s3, gcs, or iceberg sink types, configure the output format in properties:

sink:
  type: local
  properties:
    base_path:   /data/replication
    file_format: parquet          # parquet | csv | jsonl | avro
FormatExtensionNotes
parquet.parquetDefault. Columnar, best compression, schema-aware.
csv.csvPlain text, no schema embedded.
jsonl.jsonlOne JSON object per line.
avro.avroSchema embedded in each file.

SQL Server tlog mode

Set cdc_mode: tlog to read directly from the SQL Server transaction log via sys.fn_dblog without requiring CDC setup on the source.

source:
  connection: my-sqlserver
  tables: [dbo.orders]
  properties:
    cdc_mode:        tlog           # "cdc" (default) | "tlog"
    log_batch_size:  "10000"
    poll_interval:   "200ms"
    max_xact_memory: "268435456"    # 256 MiB cap per transaction

Requires: database must use FULL or BULK_LOGGED recovery model. Only VIEW DATABASE STATE privilege is needed — no CDC setup required.


Applying changes

nanosync apply --file pipelines.yaml            # upsert all connections and pipelines in the file
nanosync apply --file pipelines.yaml --dry-run  # validate locally without making changes

To reload the server config (database, notifications, logging) after a change, send SIGHUP — no restart required:

kill -HUP $(pgrep nanosync)

SIGHUP reloads and validates the server config only. Pipeline definitions are managed exclusively via nanosync apply and the API — they are unaffected by SIGHUP.


Full annotated example

server.yaml — server config

secrets_backend: gcp

database:
  type: postgres
  dsn: "${secret:nanosync/store-dsn}"

notifications:
  slack:
    webhook_url: ${env:SLACK_WEBHOOK_URL}
  pagerduty:
    routing_key: ${env:PAGERDUTY_ROUTING_KEY}

logging:
  level: info
  dir: /var/log/nanosync

pipelines.yaml — resource config

connections:
  - name: prod-postgres
    type: postgres
    dsn: "postgres://replicator:${env:PG_PASSWORD}@db.prod:5432/orders?sslmode=require"

  - name: warehouse
    type: bigquery
    properties:
      project_id: acme-data
      dataset_id: replication
      credentials_file: /etc/nanosync/bq-sa.json

pipelines:
  - name: orders-to-warehouse
    replication_type: cdc_backfill
    source:
      connection: prod-postgres
      tables:
        - public.orders
        - public.order_items
        - public.products
      properties:
        replication_slot: nanosync_orders
        chunk_size:       "5000"
        snapshot_workers: "8"
      exclude_columns:
        public.orders: [internal_notes, debug_flag]

    sink:
      connection: warehouse
      properties:
        table_id: orders_cdc
      target_mappings:
        - source: "public.orders"
          name: orders_cdc
          include_by_default: true
          exclude_fields: [internal_notes, debug_flag]

    rate_limit:
      max_events_per_second: 50000
      max_bytes_per_second: 524288000   # 500 MiB/s

    alerts:
      - event: pipeline_error
        channels: [slack, pagerduty]
      - event: lag_high
        threshold_seconds: 30
        for: 1m
        channels: [slack]

Starting the server and applying pipelines

# Start the server (loads server.yaml for database, notifications, logging)
nanosync start server --config server.yaml

# In another terminal: apply the pipeline definitions
nanosync apply --file pipelines.yaml