Replicate SQL Server to BigQuery

End-to-end guide to setting up a live SQL Server → BigQuery pipeline with nanosync. CDC vs tlog mode, IAM setup, config, and production verification.

Nanosync connects to SQL Server via either CDC tables or direct transaction log reads, and streams changes into BigQuery using the Storage Write API with exactly-once semantics. This guide walks through both capture modes end-to-end — from permissions setup through production verification.

Choose your capture mode

This is the first decision to make before touching any config. The two modes have different requirements and tradeoffs:

CDC modetlog mode
How it worksReads SQL Server CDC change tablesReads sys.fn_dblog directly
RequirementCDC enabled on DB + tables; FULL or BULK_LOGGED recovery modelVIEW DATABASE STATE permission only
Best forProduction workloads; lower overhead; recommended defaultRestricted environments where CDC isn’t available; high-throughput workloads where you need to minimise impact on the source database
SQL Server version2016+Any

If you have the permissions to enable CDC, use CDC mode — it’s lower overhead and more reliable. Use tlog mode when:

Prerequisites

Complete these setup steps before continuing:

Create the config file

Create nanosync.yaml in your working directory:

connections:
  - name: my-sqlserver
    type: sqlserver
    dsn: "sqlserver://nanosync:${env:SQL_PASSWORD}@host:1433?database=mydb"

  - name: my-bigquery
    type: bigquery
    properties:
      project_id:       my-project
      dataset_id:       replication
      credentials_file: /path/to/nanosync-bq-key.json

pipelines:
  - name: orders-to-bigquery
    source:
      connection: my-sqlserver
      tables:
        - dbo.orders
        - dbo.order_items
      properties:
        cdc_mode: cdc
    sink:
      connection: my-bigquery
      properties:
        table_id:     orders
        primary_keys: id
connections:
  - name: my-sqlserver
    type: sqlserver
    dsn: "sqlserver://nanosync:${env:SQL_PASSWORD}@host:1433?database=mydb"

  - name: my-bigquery
    type: bigquery
    properties:
      project_id:       my-project
      dataset_id:       replication
      credentials_file: /path/to/nanosync-bq-key.json

pipelines:
  - name: orders-to-bigquery
    source:
      connection: my-sqlserver
      tables:
        - dbo.orders
        - dbo.order_items
      properties:
        cdc_mode: tlog
    sink:
      connection: my-bigquery
      properties:
        table_id:     orders
        primary_keys: id

Secrets never go in the config file. Use ${env:VAR_NAME} — values are injected from environment variables at startup.

Test your connections

Before starting the pipeline, verify both connections are reachable:

export SQL_PASSWORD=my-secret

nanosync test connection my-sqlserver
# ✓  my-sqlserver connected in 18 ms

nanosync test connection my-bigquery
# ✓  my-bigquery connected in 34 ms

Fix any connection errors before continuing — don’t waste time debugging a running pipeline.

Start the pipeline

nanosync start dev --config nanosync.yaml
nanosync serving — Ctrl-C to stop
  REST API: http://localhost:7600/v1
  Web UI:   http://localhost:7600/app

The pipeline starts automatically. The initial snapshot begins immediately — nanosync reads all rows from the configured tables in parallel, then switches to streaming live changes:

NAME                 SOURCE         TARGET     STATUS          LAG    EV/S
orders-to-bigquery   my-sqlserver   bigquery   ● snapshotting  —      —
                                               [████████░░░░]  67%  2,400,000 rows

orders-to-bigquery   my-sqlserver   bigquery   ● live CDC      14ms   3,840

Verify it’s working

# Check pipeline status
nanosync monitor

# Count rows in BigQuery
bq query --use_legacy_sql=false 'SELECT COUNT(*) FROM `my-project.replication.orders`'

# Compare with source
sqlcmd -S host -U nanosync -Q "SELECT COUNT(*) FROM dbo.orders"

Row counts should match once the snapshot completes. After that, inserts and updates on the source appear in BigQuery within the pipeline’s replication lag (visible in nanosync monitor under the LAG column).

Production considerations

CDC vs tlog long-term

CDC is the right default for production. tlog mode reads directly from the transaction log and requires no schema changes, but it has higher CPU overhead on busy databases. If you started with tlog and your DBA later enables CDC, you can switch by changing cdc_mode: cdc and restarting — nanosync will resync automatically.

Schema drift

SQL Server CDC handles column adds automatically — nanosync adds the column to BigQuery on the next batch. Column renames and drops require manual intervention — nanosync will stop and alert. See Configuration reference for schema_drift_mode.

What’s next