Replicate SQL Server to BigQuery

Nanosync connects to SQL Server via either CDC tables or direct transaction log reads, and streams changes into BigQuery using the Storage Write API with exactly-once semantics. This guide walks through both capture modes end-to-end — from permissions setup through production verification.

Choose your capture mode

This is the first decision to make before touching any config. The two modes have different requirements and tradeoffs:

	CDC mode	tlog mode
How it works	Reads SQL Server CDC change tables	Reads `sys.fn_dblog` directly
Requirement	CDC enabled on DB + tables; `FULL` or `BULK_LOGGED` recovery model	`VIEW DATABASE STATE` permission only
Best for	Production workloads; lower overhead; recommended default	Restricted environments where CDC isn’t available; high-throughput workloads where you need to minimise impact on the source database
SQL Server version	2016+	Any

If you have the permissions to enable CDC, use CDC mode — it’s lower overhead and more reliable. Use tlog mode when:

You’re on a managed instance where you can’t enable CDC
Your database has very high change rates and you need to avoid any replication overhead on the source
You need read-only access with no schema changes on the source

Prerequisites

Complete these setup steps before continuing:

SQL Server source setup — enable CDC or configure tlog permissions, create the nanosync login
BigQuery IAM setup — create the service account and dataset

Create the config files

Nanosync uses two separate YAML files: one for the server and one for pipeline definitions.

Create server.yaml (can be minimal for dev):

# Server config — passed to nanosync start
# Add notifications, database, or logging settings here as needed.

Create pipelines.yaml:

connections:
  - name: my-sqlserver
    type: sqlserver
    dsn: "sqlserver://nanosync:${env:SQL_PASSWORD}@host:1433?database=mydb"

  - name: my-bigquery
    type: bigquery
    properties:
      project_id:       my-project
      dataset_id:       replication
      credentials_file: /path/to/nanosync-bq-key.json

pipelines:
  - name: orders-to-bigquery
    replication_type: cdc_backfill
    source:
      connection: my-sqlserver
      tables:
        - dbo.orders
        - dbo.order_items
      properties:
        cdc_mode: cdc
    sink:
      connection: my-bigquery
      properties:
        table_id: orders

connections:
  - name: my-sqlserver
    type: sqlserver
    dsn: "sqlserver://nanosync:${env:SQL_PASSWORD}@host:1433?database=mydb"

  - name: my-bigquery
    type: bigquery
    properties:
      project_id:       my-project
      dataset_id:       replication
      credentials_file: /path/to/nanosync-bq-key.json

pipelines:
  - name: orders-to-bigquery
    replication_type: cdc_backfill
    source:
      connection: my-sqlserver
      tables:
        - dbo.orders
        - dbo.order_items
      properties:
        cdc_mode: tlog
    sink:
      connection: my-bigquery
      properties:
        table_id: orders

Secrets never go in the config file. Use ${env:VAR_NAME} — values are injected from environment variables at startup.

Test your connections

Before starting the pipeline, verify both connections are reachable:

export SQL_PASSWORD=my-secret

nanosync test connection my-sqlserver
# ✓  my-sqlserver connected in 18 ms

nanosync test connection my-bigquery
# ✓  my-bigquery connected in 34 ms

Fix any connection errors before continuing — don’t waste time debugging a running pipeline.

Start the server and apply pipelines

# Start the server
nanosync start dev --config server.yaml

nanosync serving — Ctrl-C to stop
  REST API: http://localhost:7600/v1
  Web UI:   http://localhost:7600/app

Then apply your pipeline definitions in a separate terminal:

nanosync apply --file pipelines.yaml

The pipeline starts immediately. The initial snapshot begins — nanosync reads all rows from the configured tables in parallel, then switches to streaming live changes:

NAME                 SOURCE         TARGET     STATUS          LAG    EV/S
orders-to-bigquery   my-sqlserver   bigquery   ● snapshotting  —      —
                                               [████████░░░░]  67%  2,400,000 rows

orders-to-bigquery   my-sqlserver   bigquery   ● live CDC      14ms   3,840

Verify it’s working

# Check pipeline status
nanosync monitor

# Count rows in BigQuery
bq query --use_legacy_sql=false 'SELECT COUNT(*) FROM `my-project.replication.orders`'

# Compare with source
sqlcmd -S host -U nanosync -Q "SELECT COUNT(*) FROM dbo.orders"

Row counts should match once the snapshot completes. After that, inserts and updates on the source appear in BigQuery within the pipeline’s replication lag (visible in nanosync monitor under the LAG column).

Production considerations

CDC vs tlog long-term

CDC is the right default for production. tlog mode reads directly from the transaction log and requires no schema changes, but it has higher CPU overhead on busy databases. If you started with tlog and your DBA later enables CDC, you can switch by changing cdc_mode: cdc and restarting — nanosync will resync automatically.

Schema drift

SQL Server CDC handles column adds automatically — nanosync adds the column to BigQuery on the next batch. Column renames and drops require manual intervention — nanosync will stop and alert. See Configuration reference for schema_drift_mode.

What’s next

SQL Server source setup — full CDC and tlog configuration reference
BigQuery sink setup — IAM, partitioning, DELETE handling
Configuration reference — all YAML options
CLI reference — full command listing