Skip to main content

Change Data Capture (CDC)

Applicable EditionsTapData CloudTapData Cloud offers you cloud services that are suitable for scenarios requiring rapid deployment and low initial investment, helping you focus more on business development rather than infrastructure management. Free trial with TapData Cloud.TapData EnterpriseTapData Enterprise can be deployed in your local data center, making it suitable for scenarios with strict requirements on data sensitivity or network isolation. It can serve to build real-time data warehouses, enable real-time data exchange, data migration, and more.TapData CommunityTapData Community is an open-source data integration platform that provides basic data synchronization and transformation capabilities. This helps you quickly explore and implement data integration projects. As your project or business grows, you can seamlessly upgrade to TapData Cloud or TapData Enterprise to access more advanced features and service support.

Change Data Capture (CDC) is a method for capturing and tracking data changes in a database. It plays a crucial role in data synchronization and integration, enabling incremental data synchronization. This document provides a detailed overview of the various CDC methods, helping you understand their working principles, advantages, and disadvantages, and offering specific usage instructions.

CDC Methods​

CDC Methods

Database Log API-Based CDC is a commonly used data change capture technique that captures incremental data changes by reading and parsing the database's transaction logs. These logs are a key component used by the database management system to ensure data integrity and recoverability, recording every detailed operation of the database.

For example, in MySQL, administrators can enable Binlog by modifying the database configuration file (mysql.cnf) to record all data modification operations and capture data change details.

server_id         = 223344
log_bin = mysql-bin
expire_logs_days = 7
binlog_format = row
binlog_row_image = full

After completing permission granting and data source connection, you can configure it as a data source in Tapdata's task configuration to achieve full and incremental data synchronization (default).

Select Data Sync Type

CDC Method Comparison​

CategoryDatabase Log APIDatabase Log FileField PollingDatabase Trigger
Distinguishes Insert/Update Operationsβœ…βœ…βž–βœ…
Monitors Delete Operationsβœ…βœ…βž–βœ…
Real-time Collectionβœ…βœ… (Ultra-high performance)βž–βœ…
Business Intrusion🟒 Low🟒 LowπŸ”΄ High🟑 Medium
DBA Maintenance Cost🟑 MediumπŸ”΄ High (Requires additional components)🟒 LowπŸ”΄ High (Trigger management is complex)
System Overhead Cost🟒 Low🟒 LowπŸ”΄ HighπŸ”΄ High

FAQs​

  • Q: Which data sources does Tapdata support CDC capture for?

    A: Please refer to the tables in Supported Data Sources. If incremental data is supported as a data source, CDC information can be obtained.

  • Q: If my data source supports CDC, how do I choose the CDC collection method?

    A: To maximize compatibility and collection performance, Tapdata supports the following CDC collection methods:

    • Database Log API: The default collection method, supported by most databases. If permission restrictions prevent log access or for certain SaaS data sources, choose the Field Polling method.
    • Database Log File: Currently supported only for Oracle and Db2 data sources.
    • Field Polling: Set the incremental synchronization method for the source node in Tapdata when configuring the data transformation task.