Configuration Reference
Nanosync uses two separate YAML files with distinct purposes:
| File | Used with | Contains |
|---|---|---|
| Server config | nanosync start --config server.yaml | Database, secrets backend, notifications, logging |
| Resource config | nanosync apply --file pipelines.yaml | Connections and pipeline definitions |
This separation means your pipeline definitions are never entangled with server-level credentials. You can restart the server without touching pipeline state, and update pipelines without restarting the server.
Server config
The server config file is passed to nanosync start dev or nanosync start server. It controls only the daemon process — not pipelines.
# ── Secrets backend ───────────────────────────────────────────────────────────
# Enables ${secret:...} expansion across all string fields.
# Options: env (default), gcp, aws, vault
secrets_backend: gcp
# For HashiCorp Vault:
# secrets_backend: vault
# vault_addr: https://vault.example.com
# vault_role_id: ${env:VAULT_ROLE_ID}
# vault_secret_id: ${env:VAULT_SECRET_ID}
# ── Database (state store) ────────────────────────────────────────────────────
database:
type: sqlite # sqlite (default) or postgres
data_dir: /var/lib/nanosync # SQLite file location; ignored when dsn is set
# dsn: ${secret:nanosync/store-dsn} # explicit path (SQLite) or connection string (Postgres)
# ── Notifications ─────────────────────────────────────────────────────────────
# Global channels referenced by alerts[].channels in pipeline definitions.
notifications:
slack:
webhook_url: ${env:SLACK_WEBHOOK_URL}
pagerduty:
routing_key: ${env:PAGERDUTY_ROUTING_KEY}
webhooks:
- name: ops-webhook
url: https://hooks.example.com/nanosync
headers:
Authorization: "Bearer ${env:WEBHOOK_TOKEN}"
timeout: 5s
# ── Logging ───────────────────────────────────────────────────────────────────
logging:
level: info # debug | info | warn | error
dir: /var/log/nanosync # omit to log to stdout only
otel_endpoint: "" # OTLP/HTTP endpoint for trace export
cloud_logging_project: "" # GCP project for Cloud Logging export
Server config fields
| Field | Type | Description |
|---|---|---|
secrets_backend | string | Secret provider: env (default), gcp, aws, vault |
vault_addr | string | Vault server URL (when secrets_backend: vault) |
vault_role_id | string | Vault AppRole role ID |
vault_secret_id | string | Vault AppRole secret ID |
database.type | string | State store driver: sqlite (default) or postgres |
database.data_dir | string | Directory for the SQLite file. Ignored when dsn is set. |
database.dsn | string | Explicit data source name. For SQLite: a file path. For Postgres: a connection string. |
notifications.slack.webhook_url | string | Incoming webhook URL for Slack alerts |
notifications.pagerduty.routing_key | string | PagerDuty Events API v2 integration key |
notifications.webhooks | list | Named generic webhook destinations |
logging.level | string | Minimum log level: debug, info, warn, error |
logging.dir | string | Directory for rotating log files. Omit to write to stdout. |
logging.otel_endpoint | string | OTLP/HTTP endpoint for the OTel logs bridge |
logging.cloud_logging_project | string | GCP project for Cloud Logging export (ADC credentials required) |
Resource config (nanosync apply)
The resource config is applied to a running server with nanosync apply --file pipelines.yaml. It declares connections and pipelines that are upserted into the state store.
# Top-level structure
connections:
- ... # named, reusable connection definitions
pipelines:
- ... # pipeline definitions that reference connections by name
apply is idempotent — it creates or updates all listed resources, and removes any server-side resources that are no longer in the file.
nanosync apply --file pipelines.yaml # apply changes
nanosync apply --file pipelines.yaml --dry-run # validate without changing anything
Connections
Named connections let you define credentials once and reference them across many pipelines. Connections are the authority for connector type and credentials — always define them in connections: rather than inline in each pipeline.
connections:
- name: prod-postgres # unique identifier referenced by pipelines
type: postgres
dsn: "postgres://user:${env:PG_PASSWORD}@db.prod:5432/mydb?sslmode=require"
- name: prod-bigquery
type: bigquery
properties:
project_id: my-gcp-project
dataset_id: replication
credentials_file: /etc/nanosync/bq-sa.json
| Field | Type | Required | Description |
|---|---|---|---|
name | string | yes | Unique identifier referenced in pipeline source.connection and sink.connection fields |
type | string | yes | Connector type. Sources: postgres, sqlserver, kafka, local, s3, gcs, stdin. Sinks: bigquery, spanner, alloydb, cloudsql, kafka, pubsub, local, s3, gcs, iceberg, stdout. |
dsn | string | no | Connection string (used by database connectors) |
properties | map | no | Key-value connector properties (connector-specific) |
Secret and environment variable expansion
Any string value in the YAML supports variable expansion:
dsn: "postgres://user:${env:PG_PASSWORD}@host:5432/db"
properties:
api_key: "${secret:my-project/bq-api-key}" # fetched from the secrets backend
| Token | Resolution |
|---|---|
${env:VAR_NAME} | Process environment variable |
${secret:path} | Secret from the configured secrets_backend (GCP, AWS, Vault) |
Pipelines
pipelines:
- name: orders-to-bigquery # unique pipeline name, required
replication_type: cdc_backfill # cdc_backfill | cdc | snapshot | query
source:
connection: prod-postgres # references a named connection
tables:
- public.orders
- public.order_items
properties:
replication_slot: nanosync_orders
chunk_size: "10000"
sink:
connection: prod-bigquery
properties:
table_id: orders
rate_limit:
max_events_per_second: 10000
max_bytes_per_second: 104857600 # 100 MB/s
alerts:
- event: pipeline_error
channels: [slack]
- event: lag_high
threshold_seconds: 60
for: 2m
channels: [slack, pagerduty]
Pipeline fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | yes | Unique pipeline identifier |
source | object | yes | Source connector config |
sink | object | yes* | Single-sink configuration. Use sinks for fan-out. |
sinks | list | yes* | Multi-sink fan-out list. When set, sink is ignored. |
replication_type | string | no | cdc_backfill (default), cdc, snapshot, or query |
rate_limit | object | no | Per-pipeline throughput limits |
alerts | list | no | Alert rules referencing channels from the server notifications config |
transforms | list | no | Ordered list of WASM transform plugins applied to each batch |
batch | object | no | Batcher and channel-size tuning |
snapshot | object | no | Parallel snapshot coordinator tuning |
query | object | no | Required when replication_type: query; sets the cron schedule |
*One of sink or sinks is required.
Replication types
| Type | Behaviour |
|---|---|
cdc_backfill | (default) Snapshot existing rows, then stream live CDC changes |
cdc | Stream live CDC changes only — no initial snapshot |
snapshot | One-off full table copy; pipeline stops when complete |
query | Run a scheduled SQL query on the source (requires query.schedule) |
Source fields
| Field | Type | Description |
|---|---|---|
connection | string | Name of a named connection (recommended) |
type | string | Connector type (required if no connection) |
dsn | string | Connection string (overrides named connection on conflict) |
tables | list | Tables to replicate, in schema.table format |
files | list | File paths or glob patterns for file-based sources (local, s3, gcs). Mutually exclusive with tables. |
properties | map | Connector-specific options (see connector docs) |
include_schemas | list | Replicate only tables in these schemas |
exclude_schemas | list | Skip all tables in these schemas |
exclude_tables | list | Skip specific tables (cannot overlap with tables) |
exclude_columns | map | Per-table column exclusion: {"public.orders": ["pii_field"]} |
A table or schema cannot appear in both an include list and an exclude list — the config will be rejected at apply time with a clear error.
Sink fields
| Field | Type | Description |
|---|---|---|
name | string | Optional human-readable label for logs and metrics |
connection | string | Name of a named connection (recommended) |
type | string | Connector type (required if no connection) |
dsn | string | Connection string (overrides named connection on conflict) |
properties | map | Connector-specific options (see connector docs) |
target_mappings | list | Column projection and routing rules (see below). When empty, all events pass through unchanged. |
Target mappings
target_mappings controls which source tables a sink receives and how columns are projected. Each entry’s source glob is matched against the event’s table name in declaration order.
sink:
connection: prod-bigquery
target_mappings:
- source: "public.orders" # glob matched against source table name
name: bq_orders # destination table name ({table} and {schema} tokens are supported)
include_by_default: true # pass all columns through unless excluded
exclude_fields:
- internal_flag
- debug_col
fields:
- source: amount_cents # source column name
name: amount # rename at destination (optional)
expression: "float64(value)/100.0" # transform expression (optional)
type: float64 # Arrow output type hint (optional)
- source: "public.*"
name: "{table}_cdc"
include_by_default: false # allowlist mode — only emit columns in fields[]
fields:
- source: id
- source: updated_at
| Field | Type | Description |
|---|---|---|
source | string | Glob pattern matched against schema.table (path.Match syntax). Required. |
name | string | Destination table, topic, or path prefix. Supports {table} and {schema} tokens. |
include_by_default | bool | When true, all source columns pass through except those in exclude_fields. When false, only columns listed in fields are emitted. |
exclude_fields | list | Columns to drop. Only valid when include_by_default: true. |
fields | list | Per-column projection, rename, and transform rules |
Field mapping
| Field | Type | Description |
|---|---|---|
source | string | Source column name. Required. |
name | string | Destination column name. Defaults to source when omitted. |
expression | string | Transform expression using the built-in vocabulary (e.g. float64(value)/100.0). |
type | string | Arrow output type declaration when it cannot be inferred from the expression. |
Multi-sink fan-out
Send one pipeline’s events to multiple sinks simultaneously. Each sink can have independent target_mappings to filter or project differently.
pipelines:
- name: orders-fan-out
source:
connection: prod-postgres
tables: [public.orders]
sinks:
- connection: prod-bigquery
target_mappings:
- source: "public.orders"
name: orders_raw
include_by_default: true
- connection: prod-kafka
properties:
topic: orders-events
target_mappings:
- source: "public.orders"
include_by_default: false
fields:
- source: id
- source: status
- source: updated_at
When sinks is non-empty, the top-level sink field is ignored.
Rate limiting
rate_limit:
max_events_per_second: 10000 # 0 = unlimited
max_bytes_per_second: 104857600 # 0 = unlimited (100 MB/s shown)
Rate limits apply per pipeline and are enforced via backpressure on the source read side.
Alerts
Alert rules reference notification channels defined in the server config’s notifications block.
alerts:
- event: pipeline_error
channels: [slack, pagerduty]
- event: lag_high
threshold_seconds: 60 # alert when lag exceeds 60 seconds
for: 2m # sustained for at least 2 minutes before firing
channels: [slack]
- event: schema_drift
channels: [ops-webhook] # named webhook from notifications.webhooks
| Field | Type | Description |
|---|---|---|
event | string | Trigger: pipeline_error, lag_high, or schema_drift |
channels | list | Notification channel names: slack, pagerduty, or a named webhook |
threshold_seconds | number | Lag threshold in seconds. Only used with lag_high. |
for | string | Duration the threshold must be sustained before firing (e.g. "5m"). Only used with lag_high. Default: "0s" (immediate). |
Batch tuning
Override the default micro-batcher behavior. Omit this block to use built-in defaults.
batch:
sink_batch_size: 1000 # events per flush (default: 1000)
event_ch_cap: 256 # per-partition event channel capacity (default: 256)
sink_ch_cap: 8 # per-partition sink-batch channel capacity (default: 8)
cdc_max_age: 100ms # max time a partial CDC batch is held before flushing
max_batch_bytes: 0 # byte ceiling per batch; 0 = use sink's built-in hint
Snapshot tuning
Override the parallel snapshot coordinator defaults. Only applies to cdc_backfill and snapshot replication types.
snapshot:
concurrency: 4 # parallel partition workers (default: max(4, GOMAXPROCS))
target_rows_per_partition: 100000 # desired rows per partition
target_bytes_per_partition: 2147483648 # desired bytes per partition (2 GiB)
max_partitions_per_table: 512 # cap on partitions per table
Query pipelines
Scheduled SQL query pipelines require replication_type: query and a query.schedule:
pipelines:
- name: daily-summary
replication_type: query
source:
connection: prod-postgres
properties:
query: "SELECT date_trunc('day', created_at) AS day, SUM(amount) FROM orders GROUP BY 1"
sink:
connection: prod-bigquery
properties:
table_id: daily_summary
query:
schedule: "@daily" # cron expression or @alias (@hourly, @daily, @every 5m)
WASM transforms
Apply compiled WASM plugins to every batch in the pipeline before it reaches the sink. Plugins are applied in order.
transforms:
- type: wasm
path: /etc/nanosync/plugins/redact-pii.wasm
config:
fields: "email,phone"
strategy: hash
File format sinks
When using local, s3, gcs, or iceberg sink types, configure the output format in properties:
sink:
type: local
properties:
base_path: /data/replication
file_format: parquet # parquet | csv | jsonl | avro
| Format | Extension | Notes |
|---|---|---|
parquet | .parquet | Default. Columnar, best compression, schema-aware. |
csv | .csv | Plain text, no schema embedded. |
jsonl | .jsonl | One JSON object per line. |
avro | .avro | Schema embedded in each file. |
SQL Server tlog mode
Set cdc_mode: tlog to read directly from the SQL Server transaction log via sys.fn_dblog without requiring CDC setup on the source.
source:
connection: my-sqlserver
tables: [dbo.orders]
properties:
cdc_mode: tlog # "cdc" (default) | "tlog"
log_batch_size: "10000"
poll_interval: "200ms"
max_xact_memory: "268435456" # 256 MiB cap per transaction
Requires: database must use FULL or BULK_LOGGED recovery model. Only VIEW DATABASE STATE privilege is needed — no CDC setup required.
Applying changes
nanosync apply --file pipelines.yaml # upsert all connections and pipelines in the file
nanosync apply --file pipelines.yaml --dry-run # validate locally without making changes
To reload the server config (database, notifications, logging) after a change, send SIGHUP — no restart required:
kill -HUP $(pgrep nanosync)
SIGHUP reloads and validates the server config only. Pipeline definitions are managed exclusively via nanosync apply and the API — they are unaffected by SIGHUP.
Full annotated example
server.yaml — server config
secrets_backend: gcp
database:
type: postgres
dsn: "${secret:nanosync/store-dsn}"
notifications:
slack:
webhook_url: ${env:SLACK_WEBHOOK_URL}
pagerduty:
routing_key: ${env:PAGERDUTY_ROUTING_KEY}
logging:
level: info
dir: /var/log/nanosync
pipelines.yaml — resource config
connections:
- name: prod-postgres
type: postgres
dsn: "postgres://replicator:${env:PG_PASSWORD}@db.prod:5432/orders?sslmode=require"
- name: warehouse
type: bigquery
properties:
project_id: acme-data
dataset_id: replication
credentials_file: /etc/nanosync/bq-sa.json
pipelines:
- name: orders-to-warehouse
replication_type: cdc_backfill
source:
connection: prod-postgres
tables:
- public.orders
- public.order_items
- public.products
properties:
replication_slot: nanosync_orders
chunk_size: "5000"
snapshot_workers: "8"
exclude_columns:
public.orders: [internal_notes, debug_flag]
sink:
connection: warehouse
properties:
table_id: orders_cdc
target_mappings:
- source: "public.orders"
name: orders_cdc
include_by_default: true
exclude_fields: [internal_notes, debug_flag]
rate_limit:
max_events_per_second: 50000
max_bytes_per_second: 524288000 # 500 MiB/s
alerts:
- event: pipeline_error
channels: [slack, pagerduty]
- event: lag_high
threshold_seconds: 30
for: 1m
channels: [slack]
Starting the server and applying pipelines
# Start the server (loads server.yaml for database, notifications, logging)
nanosync start server --config server.yaml
# In another terminal: apply the pipeline definitions
nanosync apply --file pipelines.yaml