Monitoring

nanosync monitor

Opens a live terminal dashboard showing all running pipelines:

nanosync monitor
NAME                 SOURCE         DESTINATION   STATUS        LAG     EV/S
orders-pipeline      prod-postgres   bigquery      ● live CDC    14ms    3,840
users-pipeline       prod-postgres   bigquery      ● live CDC    11ms    220
audit-pipeline       prod-postgres   kafka         ● live CDC    8ms     4,100

Keyboard shortcuts:

KeyAction
EnterDrill into table-level breakdown for the selected pipeline
wSwitch to worker view (snapshot parallelism, per-worker throughput)
qQuit

To monitor a single pipeline:

nanosync monitor --pipeline orders-pipeline

Pipeline states

StatusMeaning
snapshottingInitial table read in progress — shows % complete and rows/sec
live CDCStreaming live changes from the source
pausedManually paused, or stopped due to schema drift
erroredUnrecoverable error — check logs with nanosync logs --pipeline <name>
idleNo changes seen recently — normal for low-traffic tables

errored requires operator attention. paused due to schema drift resumes after you resolve the mismatch and run nanosync pipeline resume <name>.


Key metrics

MetricWhat it tells you
LAGEnd-to-end latency from source commit to destination write
EV/SEvents per second throughput
Snapshot progressPercentage complete + rows/sec during initial backfill

LAG is the most operationally important metric. A rising LAG indicates the pipeline is falling behind — either the source is writing faster than the sink can accept, or the destination is slow. A spike that recovers quickly is normal after restarts; a sustained increase warrants investigation.


Prometheus metrics

Nanosync exposes a Prometheus-compatible /metrics endpoint at http://localhost:7600/metrics when the server is running.

To print current metric values for a single pipeline from the CLI:

nanosync metrics pipeline orders-pipeline

Key metrics:

MetricDescription
ns_pipeline_replication_lag_secondsEnd-to-end source-to-sink latency (gauge)
ns_cdc_events_totalTotal CDC events processed, labeled by pipeline and table
ns_snapshot_rows_totalRows written during initial snapshot, labeled by pipeline and table
ns_pipeline_errors_totalError count per pipeline

All metrics carry a pipeline label. Table-level metrics also carry a table label in schema.table format.


Alerts

Configure lag, error, and schema drift alerts in server.yaml. Channels reference the notifications block defined at the server level.

notifications:
  slack:
    webhook_url: "${env:SLACK_WEBHOOK}"

alerts:
  - pipeline: orders-pipeline
    lag_threshold: "30s"
    on_error: true
    on_schema_drift: true

See Configuration Reference for the full alert schema, including for (minimum sustained duration before firing) and multi-channel routing.


Scripting and JSON output

# JSON output for scripting
nanosync list pipelines --output json | jq '.[] | {name, status, lag_ms}'

The JSON output includes all fields shown in nanosync monitor plus internal metadata (checkpoint LSN, last event time, error detail). Useful for feeding pipeline state into external dashboards or health check scripts.


For production, scrape /metrics with Prometheus and alert on ns_pipeline_replication_lag_seconds > 60. This catches both pipeline errors and destination write slowdowns before they become incidents — a paused or errored pipeline will show lag climbing continuously even if no error event fires.