Skip to main content

Change Data Capture (CDC)

Change Data Capture (CDC) is a method for capturing and tracking data changes in a database. It plays a crucial role in data synchronization and integration, enabling incremental data synchronization. This document provides a detailed overview of the various CDC methods, helping you understand their working principles, advantages, and disadvantages, and offering specific usage instructions.

CDC Methods​

CDC Methods

Database Log API-Based CDC is a commonly used data change capture technique that captures incremental data changes by reading and parsing the database's transaction logs. These logs are a key component used by the database management system to ensure data integrity and recoverability, recording every detailed operation of the database.

For example, in MySQL, administrators can enable Binlog by modifying the database configuration file (mysql.cnf) to record all data modification operations and capture data change details.

server_id         = 223344
log_bin = mysql-bin
expire_logs_days = 7
binlog_format = row
binlog_row_image = full

After completing permission granting and data source connection, you can configure it as a data source in Tapdata's task configuration to achieve full and incremental data synchronization (default).

Select Data Sync Type

CDC Method Comparison​

CategoryDatabase Log APIDatabase Log FileField PollingDatabase Trigger
Distinguishes Insert/Update Operationsβœ…βœ…βž–βœ…
Monitors Delete Operationsβœ…βœ…βž–βœ…
Real-time Collectionβœ…βœ… (Ultra-high performance)βž–βœ…
Business Intrusion🟒 Low🟒 LowπŸ”΄ High🟑 Medium
DBA Maintenance Cost🟑 MediumπŸ”΄ High (Requires additional components)🟒 LowπŸ”΄ High (Trigger management is complex)
System Overhead Cost🟒 Low🟒 LowπŸ”΄ HighπŸ”΄ High

FAQs​

  • Q: Which data sources does Tapdata support CDC capture for?

    A: Please refer to the tables in Supported Data Sources. If incremental data is supported as a data source, CDC information can be obtained.

  • Q: If my data source supports CDC, how do I choose the CDC collection method?

    A: To maximize compatibility and collection performance, Tapdata supports the following CDC collection methods:

    • Database Log API: The default collection method, supported by most databases. If permission restrictions prevent log access or for certain SaaS data sources, choose the Field Polling method.
    • Database Log File: Currently supported only for Oracle and Db2 data sources.
    • Field Polling: Set the incremental synchronization method for the source node in Tapdata when configuring the data transformation task.