Huawei Cloud GaussDB
TapData Cloud offers you cloud services that are suitable for scenarios requiring rapid deployment and low initial investment, helping you focus more on business development rather than infrastructure management. Free trial with TapData Cloud.TapData Enterprise can be deployed in your local data center, making it suitable for scenarios with strict requirements on data sensitivity or network isolation. It can serve to build real-time data warehouses, enable real-time data exchange, data migration, and more.TapData Community is an open-source data integration platform that provides basic data synchronization and transformation capabilities. This helps you quickly explore and implement data integration projects. As your project or business grows, you can seamlessly upgrade to TapData Cloud or TapData Enterprise to access more advanced features and service support.GaussDB is a distributed relational database independently developed by Huawei, supporting distributed transactions, cross-AZ deployment, and zero data loss. It offers scalability of over 1000 nodes, PB-level massive storage, providing enterprises with a comprehensive, stable, reliable, scalable, and high-performance enterprise-grade database service. TapData supports using GaussDB as a source or target database, helping you quickly build data flow pipelines. Next, we will introduce how to connect GaussDB data sources in the TapData platform.
Supported Versions
Huawei Cloud GaussDB Enterprise Edition 2.8 (Primary/Standby)
If you are using an on-premises deployment of GaussDB, the supported version is GaussDB Standby 8.1.
Incremental Synchronization Instructions
To achieve incremental data reading, TapData requires Huawei Cloud GaussDB's logical decoding function to extract changes submitted to the transaction log and parse data changes. The limitations are as follows:
- Supported decoded data types are: BIGINT, BIGSERIAL, CHAR(n), DATE, DOUBLE PRECISION, FLOAT, INTEGER, SERIAL, SMALLINT, SMALLSERIAL, TEXT, TIME[WITHOUT TIME ZONE], TIMESTAMP[WITHOUT TIME ZONE], TINYINT, VARCHAR(n).
- The size of a single tuple should not exceed 1 GB. Considering the decoded result may be larger than the inserted data, it is recommended that the size of a single tuple does not exceed 500 MB.
- To parse the UPDATE and DELETE statements of an Astore table, the table must be configured with the REPLICA IDENTITY attribute. If the table does not have a primary key, it must be configured as FULL.
- DDL statements are not supported for decoding. Executing certain DDL statements (such as truncating a regular table or exchanging a partition table) may cause data loss during decoding. Additionally, after executing a DDL statement in a transaction, the DDL statement and subsequent statements will not be decoded.
- Interval partition tables are not supported for replication.
- Global temporary tables are not supported.
Preparations
- Visit Huawei Cloud GaussDB and create a database user and grant permissions.
- As Source Database
- As Target Database
- Full Synchronization: DATABASE's CONNECT permission, SCHEMA's USAGE permission, table's SELECT or UPDATE permission (UPDATE permission required for locking tables without primary keys), SEQUENCE's SELECT permission.
- Full + Incremental Synchronization: REPLICATION permission or inheritance of the built-in role gs_role_replication, DATABASE's CONNECT permission, SCHEMA's USAGE permission, table's SELECT or UPDATE permission (UPDATE permission required for locking tables without primary keys), SEQUENCE's SELECT permission.
- Database Level Permissions: Use root or other DATABASE users with the Sysadmin role to log in to the postgres base library, granting users the CREATE and CONNECT permissions on DATABASE. Authorization example:
GRANT CREATE, CONNECT ON DATABASE <database> TO <user>;
- SCHEMA Level Permissions: Use root, other DATABASE users with the Sysadmin role, or use the OWNER user of the database to log in to the database, granting users the CREATE and USAGE permissions on SCHEMA. Authorization example:
GRANT CREATE, USAGE ON SCHEMA <schema> TO <user>;
- Table Level Permissions: Use root, other DATABASE users with the Sysadmin role, or use the OWNER user of the database to log in to the database, granting users the DML-related permissions (SELECT permission needed when handling tables without primary keys) on the tables under the SCHEMA. Authorization example:
GRANT SELECT, UPDATE, INSERT, DELETE, INDEX, ALTER ON ALL TABLES IN SCHEMA <schema> TO <user>;
Adjust the
pg_hba.conf
configuration to allow database access, replacing it with the actual IP address and username. In the following example, all users with the IP address 10.10.10.10 are allowed to access the database.# The IP address can also be set to 0.0.0.0/0 to allow all IPs
host all all 10.10.10.10/32 sha256
# Only incremental data synchronization requires the following configuration
host replication all 10.10.10.10/32 sha256If incremental data synchronization is needed, you also need to adjust the following GUC parameters. For more details, see Reset Parameters.
- wal_level: Set to logical to enable logical replication.
- max_replication_slots: Greater than or equal to the required number of physical stream replication slots + backup slots + logical replication slots for each node. The default is 20. It is recommended to set this value based on the number of tasks using this connection as the source +1.
Connect to GaussDB
In the left navigation bar, click Connection Management.
Click Create on the right side of the page.
In the pop-up dialog box, search for and select GaussDB.
Complete the data source configuration according to the following instructions.
Basic Settings
- Name: Enter a unique name with business significance.
- Type: Support GaussDB as a source or target database.
- Host: Enter the connection address of GaussDB. If connecting via public network, you also need to bind an Elastic IP.
- Port: Enter the GaussDB service port, e.g., 8000.
- Database: The database name, one connection corresponds to one database. If there are multiple databases, multiple data connections need to be created.
- Schema: The schema name. A database contains one or more schemas, and each schema contains tables and other types of objects.
- JDBC Connection Params: Additional connection parameters, default empty.
- User and Password: Enter the username and password to log in to GaussDB, for specific permission requirements, see Preparations.
- Logical Replicate IP and Logical Replicate Port: Enter the IP address of the main DN, the default port is 8001.
- Log Plugin: Keep the default mppdb_decoding.
- Time Zone: The default is the time zone used by the database, you can also manually specify it according to business needs.
Advanced Settings
CDC Log Caching: Mining the source database's incremental logs, this feature allows multiple tasks to share incremental logs from the source database, avoiding redundant reads and thus significantly reducing the load on the source database during incremental synchronization. Upon enabling this feature, an external storage should be selected to store the incremental log.
Contain table: The default option is All, which includes all tables. Alternatively, you can select Custom and manually specify the desired tables by separating their names with commas (,).
Exclude tables: Once the switch is enabled, you have the option to specify tables to be excluded. You can do this by listing the table names separated by commas (,) in case there are multiple tables to be excluded.
Agent settings: Defaults to Platform automatic allocation, you can also manually specify an agent.
Model load time: If there are less than 10,000 models in the data source, their information will be updated every hour. But if the number of models exceeds 10,000, the refresh will take place daily at the time you have specified.
Enable heartbeat table: This switch is supported when the connection type is set as the Source&Target or Source. TapData Cloud will generate a table named tapdata_heartbeat_table in the source database, which is used to monitor the source database connection and task health.
tipAfter referencing and starting the data replication/development task, the heartbeat task will be activated. At this point, you can click View heartbeat task to monitor the task.
SSL Settings: Choose whether to enable SSL connections for the data source, which can further enhance data security. After turn on the switch, you will need to upload CA files, client certificates, client key files, etc. These can be downloaded from the Database Information section of the Basic Information of the GaussDB instance.
Click Connection Test, and when passed, click Save.
tipIf the connection test fails, follow the prompts on the page to fix it.