Exactly-Once Delivery
Delivery guarantees describe what happens to events when something fails mid-pipeline — a crash, a network partition, a sink timeout. Nanosync is designed for exactly-once delivery on sinks that support it.
The three delivery guarantees
At-most-once — events are sent once and not retried on failure. An event lost mid-flight is gone. Unacceptable for data replication where every row must land.
At-least-once — events are retried until acknowledged. No events are dropped, but a failure between the sink write and the checkpoint write causes the same events to be written again on restart. Requires the sink to tolerate duplicates.
Exactly-once — events are never dropped and never duplicated, even across process restarts. Requires coordination between the checkpoint and the sink write.
How nanosync achieves exactly-once
Nanosync combines three mechanisms:
1. LSN watermarking
After the sink acknowledges each batch, nanosync writes the highest committed LSN in that batch to the state store. On restart, nanosync reads the stored LSN and tells the source to resume from that position.
This prevents data loss: if nanosync crashes before checkpointing, it restarts from the previous LSN and re-reads any events that were sent to the sink but not yet acknowledged in the checkpoint.
2. Idempotent sink writes
Re-reading from a prior LSN means some events will be sent to the sink a second time. For this to be safe, the sink write must be idempotent on the event’s LSN — writing the same event twice must produce the same result as writing it once.
For BigQuery, nanosync uses the Storage Write API in committed stream mode. Each row includes the source LSN as a deduplication key. BigQuery silently discards duplicate writes within its deduplication window (typically 1 minute). The second write of any row produces no additional row in the table.
For Kafka, nanosync uses an idempotent producer and sets the source LSN as the Kafka message key. Within a partition, the same key at the same offset is a no-op from the consumer’s perspective.
3. Atomic checkpointing
The LSN checkpoint is written to the state store only after the sink confirms the write. There is no window where:
- The checkpoint is saved but the sink write failed (would drop events), or
- The sink write succeeded but the checkpoint is not saved (would cause duplicates at the sink layer rather than LSN layer)
The sequence is always: write to sink → sink acknowledges → write checkpoint. If any step fails, nanosync retries the sink write, not the checkpoint write.
A concrete example
Suppose nanosync is streaming rows 1–2000 from Postgres to BigQuery in 1000-row batches.
- Batch 1 (rows 1–1000) is written to BigQuery. BigQuery acknowledges. Nanosync saves checkpoint LSN =
0/10000. - Batch 2 (rows 1001–2000) is written to BigQuery. BigQuery acknowledges. Nanosync begins writing checkpoint LSN =
0/20000— and crashes here, before the checkpoint is saved. - Nanosync restarts. Reads checkpoint: LSN =
0/10000. - Nanosync re-reads rows 1001–2000 from Postgres and sends them to BigQuery again.
- BigQuery receives the duplicate write. Because each row carries the same LSN deduplication key as the first write, BigQuery discards the duplicates.
- The table contains exactly rows 1–2000, each appearing once.
Sink-level guarantees
The guarantee depends on what the sink supports:
| Sink | Mechanism | Guarantee |
|---|---|---|
| BigQuery | Storage Write API committed mode + LSN deduplication key | Exactly-once |
| Kafka | Idempotent producer + LSN as message key per partition | Effectively exactly-once per partition |
| stdout | Best-effort — no persistence, no deduplication | At-least-once |
| Local filesystem | Append-only write with LSN encoded in filename | At-least-once |
For BigQuery, exactly-once holds as long as the duplicate write arrives within BigQuery’s deduplication window. Nanosync is designed to retry within seconds of a failure, well inside that window. If nanosync is paused for an extended period and the deduplication window expires before retry, duplicates are theoretically possible — in practice this requires hours of downtime and is not a concern for operational pipelines.
For Kafka, “effectively exactly-once per partition” means: within a single partition, the same LSN key is deduplicated by the idempotent producer. Across partitions or consumers, standard Kafka consumer semantics apply — consumers should handle at-least-once delivery.
Schema for custom sinks
Exactly-once requires the sink to support idempotent writes. If you implement a custom sink using nanosync’s sink interface, make your write operation idempotent on the _ns_lsn field. A write that receives the same LSN a second time must produce the same state as the first write — not an additional row, not an error.
The idiomatic pattern for a custom sink:
- Include
_ns_lsnin the destination table’s schema (as a column or in metadata). - Use
INSERT OR REPLACE/UPSERT/ON CONFLICT DO UPDATEsemantics keyed on your primary key, with_ns_lsnas a secondary guard to skip stale re-deliveries. - Acknowledge the write to nanosync only after the destination has confirmed persistence.
Relationship to checkpoints
Exactly-once depends on the checkpoint being durable. Nanosync persists checkpoints to its state store — either the embedded SQLite database (development) or a Postgres database (production). If the state store is lost, nanosync treats the pipeline as new and re-snapshots from the beginning. The snapshot path is idempotent at the sink level for the same reason: rows written during re-snapshot carry the same primary keys and are deduplicated by the sink’s upsert logic.