Replicate SQL Server to BigQuery
End-to-end guide to setting up a live SQL Server → BigQuery pipeline with nanosync. CDC vs tlog mode, IAM setup, config, and production verification.
Nanosync connects to SQL Server via either CDC tables or direct transaction log reads, and streams changes into BigQuery using the Storage Write API with exactly-once semantics. This guide walks through both capture modes end-to-end — from permissions setup through production verification.
Choose your capture mode
This is the first decision to make before touching any config. The two modes have different requirements and tradeoffs:
| CDC mode | tlog mode | |
|---|---|---|
| How it works | Reads SQL Server CDC change tables | Reads sys.fn_dblog directly |
| Requirement | CDC enabled on DB + tables; FULL or BULK_LOGGED recovery model | VIEW DATABASE STATE permission only |
| Best for | Production workloads; lower overhead; recommended default | Restricted environments where CDC isn’t available; high-throughput workloads where you need to minimise impact on the source database |
| SQL Server version | 2016+ | Any |
If you have the permissions to enable CDC, use CDC mode — it’s lower overhead and more reliable. Use tlog mode when:
- You’re on a managed instance where you can’t enable CDC
- Your database has very high change rates and you need to avoid any replication overhead on the source
- You need read-only access with no schema changes on the source
Prerequisites
Complete these setup steps before continuing:
- SQL Server source setup — enable CDC or configure tlog permissions, create the nanosync login
- BigQuery IAM setup — create the service account and dataset
Create the config file
Create nanosync.yaml in your working directory:
connections:
- name: my-sqlserver
type: sqlserver
dsn: "sqlserver://nanosync:${env:SQL_PASSWORD}@host:1433?database=mydb"
- name: my-bigquery
type: bigquery
properties:
project_id: my-project
dataset_id: replication
credentials_file: /path/to/nanosync-bq-key.json
pipelines:
- name: orders-to-bigquery
source:
connection: my-sqlserver
tables:
- dbo.orders
- dbo.order_items
properties:
cdc_mode: cdc
sink:
connection: my-bigquery
properties:
table_id: orders
primary_keys: id connections:
- name: my-sqlserver
type: sqlserver
dsn: "sqlserver://nanosync:${env:SQL_PASSWORD}@host:1433?database=mydb"
- name: my-bigquery
type: bigquery
properties:
project_id: my-project
dataset_id: replication
credentials_file: /path/to/nanosync-bq-key.json
pipelines:
- name: orders-to-bigquery
source:
connection: my-sqlserver
tables:
- dbo.orders
- dbo.order_items
properties:
cdc_mode: tlog
sink:
connection: my-bigquery
properties:
table_id: orders
primary_keys: id Secrets never go in the config file. Use ${env:VAR_NAME} — values are injected from environment variables at startup.
Test your connections
Before starting the pipeline, verify both connections are reachable:
export SQL_PASSWORD=my-secret
nanosync test connection my-sqlserver
# ✓ my-sqlserver connected in 18 ms
nanosync test connection my-bigquery
# ✓ my-bigquery connected in 34 ms
Fix any connection errors before continuing — don’t waste time debugging a running pipeline.
Start the pipeline
nanosync start dev --config nanosync.yaml
nanosync serving — Ctrl-C to stop
REST API: http://localhost:7600/v1
Web UI: http://localhost:7600/app
The pipeline starts automatically. The initial snapshot begins immediately — nanosync reads all rows from the configured tables in parallel, then switches to streaming live changes:
NAME SOURCE TARGET STATUS LAG EV/S
orders-to-bigquery my-sqlserver bigquery ● snapshotting — —
[████████░░░░] 67% 2,400,000 rows
orders-to-bigquery my-sqlserver bigquery ● live CDC 14ms 3,840
Verify it’s working
# Check pipeline status
nanosync monitor
# Count rows in BigQuery
bq query --use_legacy_sql=false 'SELECT COUNT(*) FROM `my-project.replication.orders`'
# Compare with source
sqlcmd -S host -U nanosync -Q "SELECT COUNT(*) FROM dbo.orders"
Row counts should match once the snapshot completes. After that, inserts and updates on the source appear in BigQuery within the pipeline’s replication lag (visible in nanosync monitor under the LAG column).
Production considerations
CDC vs tlog long-term
CDC is the right default for production. tlog mode reads directly from the transaction log and requires no schema changes, but it has higher CPU overhead on busy databases. If you started with tlog and your DBA later enables CDC, you can switch by changing cdc_mode: cdc and restarting — nanosync will resync automatically.
Schema drift
SQL Server CDC handles column adds automatically — nanosync adds the column to BigQuery on the next batch. Column renames and drops require manual intervention — nanosync will stop and alert. See Configuration reference for schema_drift_mode.
What’s next
- SQL Server source setup — full CDC and tlog configuration reference
- BigQuery sink setup — IAM, partitioning, DELETE handling
- Configuration reference — all YAML options
- CLI reference — full command listing