Configuration Reference

Nanosync uses two separate YAML files with distinct purposes:

File	Used with	Contains
Server config	`nanosync start --config server.yaml`	Database, secrets backend, notifications, logging
Resource config	`nanosync apply --file pipelines.yaml`	Connections and pipeline definitions

This separation means your pipeline definitions are never entangled with server-level credentials. You can restart the server without touching pipeline state, and update pipelines without restarting the server.

Server config

The server config file is passed to nanosync start dev or nanosync start server. It controls only the daemon process — not pipelines.

# ── Secrets backend ───────────────────────────────────────────────────────────
# Enables ${secret:...} expansion across all string fields.
# Options: env (default), gcp, aws, vault
secrets_backend: gcp

# For HashiCorp Vault:
# secrets_backend: vault
# vault_addr:      https://vault.example.com
# vault_role_id:   ${env:VAULT_ROLE_ID}
# vault_secret_id: ${env:VAULT_SECRET_ID}

# ── Database (state store) ────────────────────────────────────────────────────
database:
  type: sqlite          # sqlite (default) or postgres
  data_dir: /var/lib/nanosync   # SQLite file location; ignored when dsn is set
  # dsn: ${secret:nanosync/store-dsn}   # explicit path (SQLite) or connection string (Postgres)

# ── Notifications ─────────────────────────────────────────────────────────────
# Global channels referenced by alerts[].channels in pipeline definitions.
notifications:
  slack:
    webhook_url: ${env:SLACK_WEBHOOK_URL}
  pagerduty:
    routing_key: ${env:PAGERDUTY_ROUTING_KEY}
  webhooks:
    - name: ops-webhook
      url: https://hooks.example.com/nanosync
      headers:
        Authorization: "Bearer ${env:WEBHOOK_TOKEN}"
      timeout: 5s

# ── Logging ───────────────────────────────────────────────────────────────────
logging:
  level: info                    # debug | info | warn | error
  dir: /var/log/nanosync         # omit to log to stdout only
  otel_endpoint: ""              # OTLP/HTTP endpoint for trace export
  cloud_logging_project: ""      # GCP project for Cloud Logging export

Server config fields

Field	Type	Description
`secrets_backend`	string	Secret provider: `env` (default), `gcp`, `aws`, `vault`
`vault_addr`	string	Vault server URL (when `secrets_backend: vault`)
`vault_role_id`	string	Vault AppRole role ID
`vault_secret_id`	string	Vault AppRole secret ID
`database.type`	string	State store driver: `sqlite` (default) or `postgres`
`database.data_dir`	string	Directory for the SQLite file. Ignored when `dsn` is set.
`database.dsn`	string	Explicit data source name. For SQLite: a file path. For Postgres: a connection string.
`notifications.slack.webhook_url`	string	Incoming webhook URL for Slack alerts
`notifications.pagerduty.routing_key`	string	PagerDuty Events API v2 integration key
`notifications.webhooks`	list	Named generic webhook destinations
`logging.level`	string	Minimum log level: `debug`, `info`, `warn`, `error`
`logging.dir`	string	Directory for rotating log files. Omit to write to stdout.
`logging.otel_endpoint`	string	OTLP/HTTP endpoint for the OTel logs bridge
`logging.cloud_logging_project`	string	GCP project for Cloud Logging export (ADC credentials required)

Resource config (`nanosync apply`)

The resource config is applied to a running server with nanosync apply --file pipelines.yaml. It declares connections and pipelines that are upserted into the state store.

# Top-level structure
connections:
  - ...   # named, reusable connection definitions

pipelines:
  - ...   # pipeline definitions that reference connections by name

apply is idempotent — it creates or updates all listed resources, and removes any server-side resources that are no longer in the file.

nanosync apply --file pipelines.yaml            # apply changes
nanosync apply --file pipelines.yaml --dry-run  # validate without changing anything

Connections

Named connections let you define credentials once and reference them across many pipelines. Connections are the authority for connector type and credentials — always define them in connections: rather than inline in each pipeline.

connections:
  - name: prod-postgres          # unique identifier referenced by pipelines
    type: postgres
    dsn: "postgres://user:${env:PG_PASSWORD}@db.prod:5432/mydb?sslmode=require"

  - name: prod-bigquery
    type: bigquery
    properties:
      project_id: my-gcp-project
      dataset_id: replication
      credentials_file: /etc/nanosync/bq-sa.json

Field	Type	Required	Description
`name`	string	yes	Unique identifier referenced in pipeline `source.connection` and `sink.connection` fields
`type`	string	yes	Connector type. Sources: `postgres`, `sqlserver`, `elasticsearch`, `kafka`, `local`, `s3`, `gcs`, `stdin`. Sinks: `bigquery`, `spanner`, `alloydb`, `cloudsql`, `kafka`, `pubsub`, `local`, `s3`, `gcs`, `iceberg`, `stdout`.
`dsn`	string	no	Connection string (used by database connectors)
`properties`	map	no	Key-value connector properties (connector-specific)

Secret and environment variable expansion

Any string value in the YAML supports variable expansion:

dsn: "postgres://user:${env:PG_PASSWORD}@host:5432/db"
properties:
  api_key: "${secret:my-project/bq-api-key}"  # fetched from the secrets backend

Token	Resolution
`${env:VAR_NAME}`	Process environment variable
`${secret:path}`	Secret from the configured `secrets_backend` (GCP, AWS, Vault)

Pipelines

pipelines:
  - name: orders-to-bigquery      # unique pipeline name, required
    replication_type: cdc_backfill  # cdc_backfill | cdc | snapshot | query

    source:
      connection: prod-postgres   # references a named connection
      tables:
        - public.orders
        - public.order_items
      properties:
        replication_slot: nanosync_orders
        chunk_size:       "10000"

    sink:
      connection: prod-bigquery
      properties:
        table_id: orders

    rate_limit:
      max_events_per_second: 10000
      max_bytes_per_second:  104857600   # 100 MB/s

    alerts:
      - event: pipeline_error
        channels: [slack]
      - event: lag_high
        threshold_seconds: 60
        for: 2m
        channels: [slack, pagerduty]

Pipeline fields

Field	Type	Required	Description
`name`	string	yes	Unique pipeline identifier
`source`	object	yes	Source connector config
`sink`	object	yes*	Single-sink configuration. Use `sinks` for fan-out.
`sinks`	list	yes*	Multi-sink fan-out list. When set, `sink` is ignored.
`replication_type`	string	no	`cdc_backfill` (default), `cdc`, `snapshot`, or `query`
`rate_limit`	object	no	Per-pipeline throughput limits
`alerts`	list	no	Alert rules referencing channels from the server notifications config
`transforms`	list	no	Ordered list of WASM transform plugins applied to each batch
`batch`	object	no	Batcher and channel-size tuning
`snapshot`	object	no	Parallel snapshot coordinator tuning
`query`	object	no	Required when `replication_type: query`; sets the cron schedule

*One of sink or sinks is required.

Replication types

Type	Behaviour
`cdc_backfill`	(default) Snapshot existing rows, then stream live CDC changes
`cdc`	Stream live CDC changes only — no initial snapshot
`snapshot`	One-off full table copy; pipeline stops when complete
`query`	Run a scheduled SQL query on the source (requires `query.schedule`)

Source fields

Field	Type	Description
`connection`	string	Name of a named connection (recommended)
`type`	string	Connector type (required if no `connection`)
`dsn`	string	Connection string (overrides named connection on conflict)
`tables`	list	Tables to replicate, in `schema.table` format
`files`	list	File paths or glob patterns for file-based sources (`local`, `s3`, `gcs`). Mutually exclusive with `tables`.
`properties`	map	Connector-specific options (see connector docs)
`include_schemas`	list	Replicate only tables in these schemas
`exclude_schemas`	list	Skip all tables in these schemas
`exclude_tables`	list	Skip specific tables (cannot overlap with `tables`)
`exclude_columns`	map	Per-table column exclusion: `{"public.orders": ["pii_field"]}`

A table or schema cannot appear in both an include list and an exclude list — the config will be rejected at apply time with a clear error.

Sink fields

Field	Type	Description
`name`	string	Optional human-readable label for logs and metrics
`connection`	string	Name of a named connection (recommended)
`type`	string	Connector type (required if no `connection`)
`dsn`	string	Connection string (overrides named connection on conflict)
`properties`	map	Connector-specific options (see connector docs)
`target_mappings`	list	Column projection and routing rules (see below). When empty, all events pass through unchanged.

Target mappings

target_mappings controls which source tables a sink receives and how columns are projected. Each entry’s source glob is matched against the event’s table name in declaration order.

sink:
  connection: prod-bigquery
  target_mappings:
    - source: "public.orders"       # glob matched against source table name
      name: bq_orders               # destination table name ({table} and {schema} tokens are supported)
      include_by_default: true       # pass all columns through unless excluded
      exclude_fields:
        - internal_flag
        - debug_col
      fields:
        - source: amount_cents       # source column name
          name: amount               # rename at destination (optional)
          expression: "float64(value)/100.0"  # transform expression (optional)
          type: float64              # Arrow output type hint (optional)

    - source: "public.*"
      name: "{table}_cdc"
      include_by_default: false      # allowlist mode — only emit columns in fields[]
      fields:
        - source: id
        - source: updated_at

Field	Type	Description
`source`	string	Glob pattern matched against `schema.table` (`path.Match` syntax). Required.
`name`	string	Destination table, topic, or path prefix. Supports `{table}` and `{schema}` tokens.
`include_by_default`	bool	When `true`, all source columns pass through except those in `exclude_fields`. When `false`, only columns listed in `fields` are emitted.
`exclude_fields`	list	Columns to drop. Only valid when `include_by_default: true`.
`fields`	list	Per-column projection, rename, and transform rules

Field mapping

Field	Type	Description
`source`	string	Source column name. Required.
`name`	string	Destination column name. Defaults to `source` when omitted.
`expression`	string	Transform expression using the built-in vocabulary (e.g. `float64(value)/100.0`).
`type`	string	Arrow output type declaration when it cannot be inferred from the expression.

Multi-sink fan-out

Send one pipeline’s events to multiple sinks simultaneously. Each sink can have independent target_mappings to filter or project differently.

pipelines:
  - name: orders-fan-out
    source:
      connection: prod-postgres
      tables: [public.orders]

    sinks:
      - connection: prod-bigquery
        target_mappings:
          - source: "public.orders"
            name: orders_raw
            include_by_default: true

      - connection: prod-kafka
        properties:
          topic: orders-events
        target_mappings:
          - source: "public.orders"
            include_by_default: false
            fields:
              - source: id
              - source: status
              - source: updated_at

When sinks is non-empty, the top-level sink field is ignored.

Rate limiting

rate_limit:
  max_events_per_second: 10000    # 0 = unlimited
  max_bytes_per_second: 104857600 # 0 = unlimited (100 MB/s shown)

Rate limits apply per pipeline and are enforced via backpressure on the source read side.

Alerts

Alert rules reference notification channels defined in the server config’s notifications block.

alerts:
  - event: pipeline_error
    channels: [slack, pagerduty]

  - event: lag_high
    threshold_seconds: 60    # alert when lag exceeds 60 seconds
    for: 2m                  # sustained for at least 2 minutes before firing
    channels: [slack]

  - event: schema_drift
    channels: [ops-webhook]  # named webhook from notifications.webhooks

Field	Type	Description
`event`	string	Trigger: `pipeline_error`, `lag_high`, or `schema_drift`
`channels`	list	Notification channel names: `slack`, `pagerduty`, or a named webhook
`threshold_seconds`	number	Lag threshold in seconds. Only used with `lag_high`.
`for`	string	Duration the threshold must be sustained before firing (e.g. `"5m"`). Only used with `lag_high`. Default: `"0s"` (immediate).

Batch tuning

Override the default micro-batcher behavior. Omit this block to use built-in defaults.

batch:
  sink_batch_size: 1000       # events per flush (default: 1000)
  event_ch_cap:    256        # per-partition event channel capacity (default: 256)
  sink_ch_cap:     8          # per-partition sink-batch channel capacity (default: 8)
  cdc_max_age:     100ms      # max time a partial CDC batch is held before flushing
  max_batch_bytes: 0          # byte ceiling per batch; 0 = use sink's built-in hint

Snapshot tuning

Override the parallel snapshot coordinator defaults. Only applies to cdc_backfill and snapshot replication types.

snapshot:
  concurrency:              4        # parallel partition workers (default: max(4, GOMAXPROCS))
  target_rows_per_partition: 100000  # desired rows per partition
  target_bytes_per_partition: 2147483648  # desired bytes per partition (2 GiB)
  max_partitions_per_table:  512     # cap on partitions per table

Query pipelines

Scheduled SQL query pipelines require replication_type: query and a query.schedule:

pipelines:
  - name: daily-summary
    replication_type: query
    source:
      connection: prod-postgres
      properties:
        query: "SELECT date_trunc('day', created_at) AS day, SUM(amount) FROM orders GROUP BY 1"
    sink:
      connection: prod-bigquery
      properties:
        table_id: daily_summary
    query:
      schedule: "@daily"   # cron expression or @alias (@hourly, @daily, @every 5m)

WASM transforms

Apply compiled WASM plugins to every batch in the pipeline before it reaches the sink. Plugins are applied in order.

transforms:
  - type: wasm
    path: /etc/nanosync/plugins/redact-pii.wasm
    config:
      fields: "email,phone"
      strategy: hash

File format sinks

When using local, s3, gcs, or iceberg sink types, configure the output format in properties:

sink:
  type: local
  properties:
    base_path:   /data/replication
    file_format: parquet          # parquet | csv | jsonl | avro

Format	Extension	Notes
`parquet`	`.parquet`	Default. Columnar, best compression, schema-aware.
`csv`	`.csv`	Plain text, no schema embedded.
`jsonl`	`.jsonl`	One JSON object per line.
`avro`	`.avro`	Schema embedded in each file.

SQL Server tlog mode

Set cdc_mode: tlog to read directly from the SQL Server transaction log via sys.fn_dblog without requiring CDC setup on the source.

source:
  connection: my-sqlserver
  tables: [dbo.orders]
  properties:
    cdc_mode:        tlog           # "cdc" (default) | "tlog"
    log_batch_size:  "10000"
    poll_interval:   "200ms"
    max_xact_memory: "268435456"    # 256 MiB cap per transaction

Requires: database must use FULL or BULK_LOGGED recovery model. Only VIEW DATABASE STATE privilege is needed — no CDC setup required.

Applying changes

nanosync apply --file pipelines.yaml            # upsert all connections and pipelines in the file
nanosync apply --file pipelines.yaml --dry-run  # validate locally without making changes

To reload the server config (database, notifications, logging) after a change, send SIGHUP — no restart required:

kill -HUP $(pgrep nanosync)

SIGHUP reloads and validates the server config only. Pipeline definitions are managed exclusively via nanosync apply and the API — they are unaffected by SIGHUP.

Full annotated example

`server.yaml` — server config

secrets_backend: gcp

database:
  type: postgres
  dsn: "${secret:nanosync/store-dsn}"

notifications:
  slack:
    webhook_url: ${env:SLACK_WEBHOOK_URL}
  pagerduty:
    routing_key: ${env:PAGERDUTY_ROUTING_KEY}

logging:
  level: info
  dir: /var/log/nanosync

`pipelines.yaml` — resource config

connections:
  - name: prod-postgres
    type: postgres
    dsn: "postgres://replicator:${env:PG_PASSWORD}@db.prod:5432/orders?sslmode=require"

  - name: warehouse
    type: bigquery
    properties:
      project_id: acme-data
      dataset_id: replication
      credentials_file: /etc/nanosync/bq-sa.json

pipelines:
  - name: orders-to-warehouse
    replication_type: cdc_backfill
    source:
      connection: prod-postgres
      tables:
        - public.orders
        - public.order_items
        - public.products
      properties:
        replication_slot: nanosync_orders
        chunk_size:       "5000"
        snapshot_workers: "8"
      exclude_columns:
        public.orders: [internal_notes, debug_flag]

    sink:
      connection: warehouse
      properties:
        table_id: orders_cdc
      target_mappings:
        - source: "public.orders"
          name: orders_cdc
          include_by_default: true
          exclude_fields: [internal_notes, debug_flag]

    rate_limit:
      max_events_per_second: 50000
      max_bytes_per_second: 524288000   # 500 MiB/s

    alerts:
      - event: pipeline_error
        channels: [slack, pagerduty]
      - event: lag_high
        threshold_seconds: 30
        for: 1m
        channels: [slack]

Starting the server and applying pipelines

# Start the server (loads server.yaml for database, notifications, logging)
nanosync start server --config server.yaml

# In another terminal: apply the pipeline definitions
nanosync apply --file pipelines.yaml