Exactly-Once Delivery

Delivery guarantees describe what happens to events when something fails mid-pipeline — a crash, a network partition, a sink timeout. Nanosync is designed for exactly-once delivery on sinks that support it.

The three delivery guarantees

At-most-once — events are sent once and not retried on failure. An event lost mid-flight is gone. Unacceptable for data replication where every row must land.

At-least-once — events are retried until acknowledged. No events are dropped, but a failure between the sink write and the checkpoint write causes the same events to be written again on restart. Requires the sink to tolerate duplicates.

Exactly-once — events are never dropped and never duplicated, even across process restarts. Requires coordination between the checkpoint and the sink write.

How nanosync achieves exactly-once

Nanosync combines three mechanisms:

1. LSN watermarking

After the sink acknowledges each batch, nanosync writes the highest committed LSN in that batch to the state store. On restart, nanosync reads the stored LSN and tells the source to resume from that position.

This prevents data loss: if nanosync crashes before checkpointing, it restarts from the previous LSN and re-reads any events that were sent to the sink but not yet acknowledged in the checkpoint.

2. Idempotent sink writes

Re-reading from a prior LSN means some events will be sent to the sink a second time. For this to be safe, the sink write must be idempotent on the event’s LSN — writing the same event twice must produce the same result as writing it once.

For BigQuery, nanosync uses the Storage Write API in committed stream mode. Each row includes the source LSN as a deduplication key. BigQuery silently discards duplicate writes within its deduplication window (typically 1 minute). The second write of any row produces no additional row in the table.

For Kafka, nanosync uses an idempotent producer and sets the source LSN as the Kafka message key. Within a partition, the same key at the same offset is a no-op from the consumer’s perspective.

3. Atomic checkpointing

The LSN checkpoint is written to the state store only after the sink confirms the write. There is no window where:

The sequence is always: write to sink → sink acknowledges → write checkpoint. If any step fails, nanosync retries the sink write, not the checkpoint write.

A concrete example

Suppose nanosync is streaming rows 1–2000 from Postgres to BigQuery in 1000-row batches.

  1. Batch 1 (rows 1–1000) is written to BigQuery. BigQuery acknowledges. Nanosync saves checkpoint LSN = 0/10000.
  2. Batch 2 (rows 1001–2000) is written to BigQuery. BigQuery acknowledges. Nanosync begins writing checkpoint LSN = 0/20000 — and crashes here, before the checkpoint is saved.
  3. Nanosync restarts. Reads checkpoint: LSN = 0/10000.
  4. Nanosync re-reads rows 1001–2000 from Postgres and sends them to BigQuery again.
  5. BigQuery receives the duplicate write. Because each row carries the same LSN deduplication key as the first write, BigQuery discards the duplicates.
  6. The table contains exactly rows 1–2000, each appearing once.

Sink-level guarantees

The guarantee depends on what the sink supports:

SinkMechanismGuarantee
BigQueryStorage Write API committed mode + LSN deduplication keyExactly-once
KafkaIdempotent producer + LSN as message key per partitionEffectively exactly-once per partition
stdoutBest-effort — no persistence, no deduplicationAt-least-once
Local filesystemAppend-only write with LSN encoded in filenameAt-least-once

For BigQuery, exactly-once holds as long as the duplicate write arrives within BigQuery’s deduplication window. Nanosync is designed to retry within seconds of a failure, well inside that window. If nanosync is paused for an extended period and the deduplication window expires before retry, duplicates are theoretically possible — in practice this requires hours of downtime and is not a concern for operational pipelines.

For Kafka, “effectively exactly-once per partition” means: within a single partition, the same LSN key is deduplicated by the idempotent producer. Across partitions or consumers, standard Kafka consumer semantics apply — consumers should handle at-least-once delivery.

Schema for custom sinks

Exactly-once requires the sink to support idempotent writes. If you implement a custom sink using nanosync’s sink interface, make your write operation idempotent on the _ns_lsn field. A write that receives the same LSN a second time must produce the same state as the first write — not an additional row, not an error.

The idiomatic pattern for a custom sink:

  1. Include _ns_lsn in the destination table’s schema (as a column or in metadata).
  2. Use INSERT OR REPLACE / UPSERT / ON CONFLICT DO UPDATE semantics keyed on your primary key, with _ns_lsn as a secondary guard to skip stale re-deliveries.
  3. Acknowledge the write to nanosync only after the destination has confirmed persistence.

Relationship to checkpoints

Exactly-once depends on the checkpoint being durable. Nanosync persists checkpoints to its state store — either the embedded SQLite database (development) or a Postgres database (production). If the state store is lost, nanosync treats the pipeline as new and re-snapshots from the beginning. The snapshot path is idempotent at the sink level for the same reason: rows written during re-snapshot carry the same primary keys and are deduplicated by the sink’s upsert logic.