Monitoring
nanosync monitor
Opens a live terminal dashboard showing all running pipelines:
nanosync monitor
NAME SOURCE DESTINATION STATUS LAG EV/S
orders-pipeline prod-postgres bigquery ● live CDC 14ms 3,840
users-pipeline prod-postgres bigquery ● live CDC 11ms 220
audit-pipeline prod-postgres kafka ● live CDC 8ms 4,100
Keyboard shortcuts:
| Key | Action |
|---|---|
Enter | Drill into table-level breakdown for the selected pipeline |
w | Switch to worker view (snapshot parallelism, per-worker throughput) |
q | Quit |
To monitor a single pipeline:
nanosync monitor --pipeline orders-pipeline
Pipeline states
| Status | Meaning |
|---|---|
snapshotting | Initial table read in progress — shows % complete and rows/sec |
live CDC | Streaming live changes from the source |
paused | Manually paused, or stopped due to schema drift |
errored | Unrecoverable error — check logs with nanosync logs --pipeline <name> |
idle | No changes seen recently — normal for low-traffic tables |
errored requires operator attention. paused due to schema drift resumes after you resolve the mismatch and run nanosync pipeline resume <name>.
Key metrics
| Metric | What it tells you |
|---|---|
| LAG | End-to-end latency from source commit to destination write |
| EV/S | Events per second throughput |
| Snapshot progress | Percentage complete + rows/sec during initial backfill |
LAG is the most operationally important metric. A rising LAG indicates the pipeline is falling behind — either the source is writing faster than the sink can accept, or the destination is slow. A spike that recovers quickly is normal after restarts; a sustained increase warrants investigation.
Prometheus metrics
Nanosync exposes a Prometheus-compatible /metrics endpoint at http://localhost:7600/metrics when the server is running.
To print current metric values for a single pipeline from the CLI:
nanosync metrics pipeline orders-pipeline
Key metrics:
| Metric | Description |
|---|---|
ns_pipeline_replication_lag_seconds | End-to-end source-to-sink latency (gauge) |
ns_cdc_events_total | Total CDC events processed, labeled by pipeline and table |
ns_snapshot_rows_total | Rows written during initial snapshot, labeled by pipeline and table |
ns_pipeline_errors_total | Error count per pipeline |
All metrics carry a pipeline label. Table-level metrics also carry a table label in schema.table format.
Alerts
Configure lag, error, and schema drift alerts in server.yaml. Channels reference the notifications block defined at the server level.
notifications:
slack:
webhook_url: "${env:SLACK_WEBHOOK}"
alerts:
- pipeline: orders-pipeline
lag_threshold: "30s"
on_error: true
on_schema_drift: true
See Configuration Reference for the full alert schema, including for (minimum sustained duration before firing) and multi-channel routing.
Scripting and JSON output
# JSON output for scripting
nanosync list pipelines --output json | jq '.[] | {name, status, lag_ms}'
The JSON output includes all fields shown in nanosync monitor plus internal metadata (checkpoint LSN, last event time, error detail). Useful for feeding pipeline state into external dashboards or health check scripts.
For production, scrape /metrics with Prometheus and alert on ns_pipeline_replication_lag_seconds > 60. This catches both pipeline errors and destination write slowdowns before they become incidents — a paused or errored pipeline will show lag climbing continuously even if no error event fires.