Monitoring

`nanosync monitor`

Opens a live terminal dashboard showing all running pipelines:

nanosync monitor

NAME                 SOURCE         DESTINATION   STATUS        LAG     EV/S
orders-pipeline      prod-postgres   bigquery      ● live CDC    14ms    3,840
users-pipeline       prod-postgres   bigquery      ● live CDC    11ms    220
audit-pipeline       prod-postgres   kafka         ● live CDC    8ms     4,100

Keyboard shortcuts:

Key	Action
`Enter`	Drill into table-level breakdown for the selected pipeline
`w`	Switch to worker view (snapshot parallelism, per-worker throughput)
`q`	Quit

To monitor a single pipeline:

nanosync monitor --pipeline orders-pipeline

Pipeline states

Status	Meaning
`snapshotting`	Initial table read in progress — shows % complete and rows/sec
`live CDC`	Streaming live changes from the source
`paused`	Manually paused, or stopped due to schema drift
`errored`	Unrecoverable error — check logs with `nanosync logs --pipeline <name>`
`idle`	No changes seen recently — normal for low-traffic tables

errored requires operator attention. paused due to schema drift resumes after you resolve the mismatch and run nanosync pipeline resume <name>.

Key metrics

Metric	What it tells you
LAG	End-to-end latency from source commit to destination write
EV/S	Events per second throughput
Snapshot progress	Percentage complete + rows/sec during initial backfill

LAG is the most operationally important metric. A rising LAG indicates the pipeline is falling behind — either the source is writing faster than the sink can accept, or the destination is slow. A spike that recovers quickly is normal after restarts; a sustained increase warrants investigation.

Prometheus metrics

Nanosync exposes a Prometheus-compatible /metrics endpoint at http://localhost:7600/metrics when the server is running.

To print current metric values for a single pipeline from the CLI:

nanosync metrics pipeline orders-pipeline

Key metrics:

Metric	Description
`ns_pipeline_replication_lag_seconds`	End-to-end source-to-sink latency (gauge)
`ns_cdc_events_total`	Total CDC events processed, labeled by pipeline and table
`ns_snapshot_rows_total`	Rows written during initial snapshot, labeled by pipeline and table
`ns_pipeline_errors_total`	Error count per pipeline

All metrics carry a pipeline label. Table-level metrics also carry a table label in schema.table format.

Alerts

Configure lag, error, and schema drift alerts in server.yaml. Channels reference the notifications block defined at the server level.

notifications:
  slack:
    webhook_url: "${env:SLACK_WEBHOOK}"

alerts:
  - pipeline: orders-pipeline
    lag_threshold: "30s"
    on_error: true
    on_schema_drift: true

See Configuration Reference for the full alert schema, including for (minimum sustained duration before firing) and multi-channel routing.

Scripting and JSON output

# JSON output for scripting
nanosync list pipelines --output json | jq '.[] | {name, status, lag_ms}'

The JSON output includes all fields shown in nanosync monitor plus internal metadata (checkpoint LSN, last event time, error detail). Useful for feeding pipeline state into external dashboards or health check scripts.

For production, scrape /metrics with Prometheus and alert on ns_pipeline_replication_lag_seconds > 60. This catches both pipeline errors and destination write slowdowns before they become incidents — a paused or errored pipeline will show lag climbing continuously even if no error event fires.