How CDC Works

Change Data Capture (CDC) is the mechanism for capturing every INSERT, UPDATE, and DELETE as it commits to the database — in order, in real time — without polling or reading the full table.


Why polling doesn’t work

The obvious alternative to CDC is polling: run SELECT * FROM orders WHERE updated_at > $last_check on a schedule. It seems simple but breaks down at scale:

CDC sidesteps all of these by reading the database’s own change stream — the same stream the database uses internally for replication.


How PostgreSQL implements CDC

PostgreSQL writes every committed transaction to the Write-Ahead Log (WAL) before applying it to data files. The WAL is an append-only, ordered, durable record of every change.

Postgres exposes the WAL to external consumers through logical replication, using a plugin to decode binary WAL entries into a structured change stream. Nanosync uses the pgoutput plugin, which is built into Postgres 10+ and requires no extension install.

The setup involves two objects on the source:

Nanosync connects to Postgres over the replication protocol (a special Postgres wire protocol mode), subscribes to the publication through the slot, and receives a stream of decoded change messages.

Postgres WAL


pgoutput logical decoding plugin
    │  (binary WAL → structured messages)

Replication slot (nanosync_orders)
    │  (holds WAL until acknowledged)

nanosync CDC decoder
    │  (decoded messages → Arrow records)

Sink writer

How SQL Server implements CDC

SQL Server doesn’t expose a streaming replication protocol the way Postgres does. Instead, it writes changes to CDC change tables — one per tracked table, stored in the cdc schema.

A background SQL Agent job called the capture job reads the transaction log continuously and copies change records into these tables. Each row in a change table represents one INSERT, UPDATE, or DELETE, annotated with the LSN and a capture instance identifier.

Nanosync polls the CDC change tables using LSN watermarks: it reads from the last acknowledged LSN, processes the batch, commits to the sink, and advances the watermark. The poll interval is configurable (default: 200ms).

Nanosync also supports tlog mode, which reads sys.fn_dblog directly — bypassing the CDC capture job entirely. This is useful when CDC isn’t enabled or available, but requires more careful tuning on high-volume databases. See SQL Server source setup for details.


What an LSN is

An LSN (Log Sequence Number) is a monotonically increasing pointer into the transaction log. Every committed transaction produces a unique LSN. Nanosync stores the last LSN it successfully committed to the sink.

On restart, nanosync reads the checkpoint LSN from the state store and resumes consuming changes from exactly that point — not from “now”, not from the beginning, but from the precise position where it left off. This is what makes restarts safe and fast: no re-read, no manual recovery, no data loss.

Postgres LSNs look like 0/3A1B2C4D (hexadecimal offset into the WAL file). SQL Server LSNs are binary values typically rendered as 0x0000C3A100000001000a.


The event format nanosync produces

Every event emitted by nanosync has the same envelope regardless of source:

FieldTypeDescription
_ns_opstringINSERT, UPDATE, or DELETE
_ns_tablestringSource table in schema.table format
_ns_lsnstringLSN at commit
_ns_committed_atstringWall-clock commit timestamp (UTC, RFC3339)
_ns_tx_idstringSource transaction identifier
column fieldsanyColumn values from the after-image (or before-image for DELETEs)

UPDATEs include the full after-image. The before-image is included under _ns_before when the source table has REPLICA IDENTITY FULL set (Postgres) or when all columns are captured (SQL Server CDC).

DELETEs include only the primary key columns by default, unless REPLICA IDENTITY FULL is set, in which case the full before-image is included.


What CDC cannot capture

CDC has blind spots. Nanosync handles these explicitly:

EventWhat nanosync does
TRUNCATEDetected separately from the WAL (Postgres 11+) and forwarded as a _ns_op: TRUNCATE event. SQL Server CDC does not capture TRUNCATE — it’s treated as a schema change requiring manual intervention.
DDL changes (ALTER TABLE, etc.)Detected via schema comparison on each batch. Not replayed. Nanosync pauses with a SCHEMA_DRIFT alert on breaking changes.
Changes before CDC was enabledNot capturable. If CDC was enabled after the table was created and already had data, the initial snapshot fills the gap.
Large object (LOB) changesCaptured only when the LOB column is included in the publication (Postgres) or the capture instance (SQL Server CDC).

Sequence value changes (e.g. auto-increment counters) are not emitted as change events. Only row-level DML produces CDC events.