SQL Server
TapData Cloud offers you cloud services that are suitable for scenarios requiring rapid deployment and low initial investment, helping you focus more on business development rather than infrastructure management. Free trial with TapData Cloud.TapData Community is an open-source data integration platform that provides basic data synchronization and transformation capabilities. This helps you quickly explore and implement data integration projects. As your project or business grows, you can seamlessly upgrade to TapData Cloud or TapData Enterprise to access more advanced features and service support.TapData Cloud provides comprehensive support for building data pipelines utilizing Microsoft SQL Server as both the source and target database. Microsoft SQL Server is a highly regarded relational database management system developed by Microsoft.
This article serves as a detailed guide, outlining the steps to seamlessly incorporate a SQL Server database into TapData Cloud, enabling efficient data integration and management within your pipelines.
Supported Versionsβ
SQL Server 2005, 2008, 2008 R2, 2012, 2014, 2016, 2017, 2019 and 2022.
This article uses SQL Server 2017, which was deployed on Windows Server 2019 as an example to demonstrate the operation process. If you are deployed on a Linux platform and as a source database, you need to install and enable the SQL Server agent.
SQL Server 2005 as a Source Database Solution
Preparationsβ
Before establishing the connection, it is essential to complete the necessary preparations outlined in the provided article. These preparations may include authorizing an account and performing other relevant steps to ensure a smooth and secure connection.
As a Source Databaseβ
Log in to SQL Server Management Studio or SQL cmd as a sysadmin (for example, sa).
Execute the following commands to enable Change Data Capture (CDC) capabilities.
Enable CDC at the database level. When executing the command, replace database_name with the actual database name.
-- Enable Change Data Capture
USE database_name
GO
EXEC sys.sp_cdc_enable_db
GO
-- Check if Change Data Capture is enabled, a value of 1 in is_cdc_enabled indicates that the feature is enabled
SELECT [name], database_id, is_cdc_enabled
FROM sys.databases
WHERE [name] = N'database_name'
GOEnable CDC at the table level.
USE database_name
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'schema_name',
@source_name = N'table_name',
@role_name = N'role_name',
@supports_net_changes = 1
GO- database_name: The name of the database.
- schema_name: The schema name, e.g., dbo.
- table_name: The name of the table.
- role_name: The role that can access the change data. Set this to NULL if you do not wish to use a role. If a role was specified when enabling incremental replication, ensure the database user has the appropriate role so that TapData Cloud can access the incremental replication tables.
Execute the following format of the command to create a user for the data copy or development task.
-- Create a login user
CREATE LOGIN login_name WITH PASSWORD='passwd', default_database=database_name;
-- Create a database user
CREATE USER login_name FOR LOGIN login_name with default_schema=schema_name;- login_name: Enter user name.
- passwd: Enter user's password.
- database_name: Enter the database name to be logged in.
- schema_name: Enter schema name (such as dbo), which acts as a namespace or container for objects (such as tables, views, procedures, and functions). For more information, see Creating a Database Schema.
The following example creates a user named tapdata, specifying that the logged-in database is demodata and the schema is dbo:
-- Create a login user
CREATE LOGIN tapdata WITH password='Tap@123456', default_database=demodata;
-- Create a database user
CREATE USER tapdata FOR LOGIN tapdata with default_schema=dbo;Grant permissions to the account we just created, or you can customize permissions control based on business needs.
-- Grant permission to read all tables under the specified schema
GRANT SELECT ON SCHEMA::schema_name TO tapdata;
-- Grant permission to read change data capture
GRANT SELECT ON SCHEMA::cdc TO tapdata;- login_name: Enter user name.
- schema_name: Enter schema name (such as dbo), which acts as a namespace or container for objects (such as tables, views, procedures, and functions).
The following example shows that the tapdata user is granted read permission for all tables under the dbo and cdc schemas.
GRANT SELECT ON SCHEMA::dbo TO tapdata;
GRANT SELECT ON SCHEMA::cdc TO tapdata;(Optional) To read incremental data from the secondary database for data synchronization, you need to follow the above steps.
As a Target Databaseβ
Log in to SQL Server Management Studio or SQL cmd as a sysadmin (for example, sa).
Execute the following format of the command to create a user for the data copy or development task.
-- Create a login user
CREATE LOGIN login_name WITH PASSWORD='passwd', default_database=database_name;
-- Create a database user
CREATE USER login_name FOR LOGIN login_name with default_schema=schema_name;- login_name: Enter user name.
- passwd: Enter user's password.
- database_name: Enter the database name to be logged in.
- schema_name: Enter schema name (such as dbo), which acts as a namespace or container for objects (such as tables, views, procedures, and functions). For more information, see Creating a Database Schema.
The following example creates a user named tapdata, specifying that the logged-in database is demodata and the schema is dbo:
-- Create a login user
CREATE LOGIN tapdata WITH password='Tap@123456', default_database=demodata;
-- Create a database user
CREATE USER tapdata FOR LOGIN tapdata with default_schema=dbo;Grant permissions to the account we just created, or you can customize permissions control based on business needs.
-- Grant permission to create table
GRANT ALTER ON SCHEMA::schema_name TO login_name;
GRANT CREATE TABLE TO login_name;
-- Grant permission to read and write all table in specific schema
GRANT DELETE, INSERT, SELECT, UPDATE ON SCHEMA::schema_name TO login_name;- login_name: Enter user name.
- schema_name: Enter schema name (such as dbo), which acts as a namespace or container for objects (such as tables, views, procedures, and functions).
The following example shows that the tapdata user has been granted permission to create tables in the dbo schema and read/write data to all tables:
GRANT ALTER ON SCHEMA::dbo TO tapdata
GRANT CREATE TABLE TO tapdata
GRANT DELETE, INSERT, SELECT, UPDATE ON SCHEMA::dbo TO tapdata;
Enabling SSL Connection (Optional)β
To further enhance the security of the data connection, you can choose to enable SSL (Secure Sockets Layer) encryption for SQL Server databases. This provides encryption at the transport layer for network connections, enhancing the security of communication data while ensuring data integrity. The specific steps are as follows:
After completing the configuration, be sure to securely store the certificate-related files, as they will be used later when configuring connections.
Connect to SQL Serverβ
In the left navigation panel, click Connections.
On the right side of the page, click Create connection.
In the pop-up dialog, select SQL Server.
On the page that you are redirected to, follow the instructions below to fill in the connection information for SQL Server.
Connection Information Settings
- Connection name: Fill in a unique name that has business significance.
- Connection type: Supports SQL Server as a source or target database.
- DB address: The database connection address.
- Port: The service port of database.
- DB name: Database name, a connection corresponding to a database, if there are multiple databases, you need to create multiple connections.
- User: The database username.
- Password: The database password.
- Schema: Schema name.
Advanced settings
- Connection Parameter String: Additional connection parameters, default empty.
- Timezone: Defaults to the time zone used by the database, which you can also manually specify according to your business needs.
- Use SSL/TLS: Select whether to enable SSL connections for the data source to further enhance data security. After enabling this feature, you will also need to upload the CA certificate, certificate password, and server hostname. The relevant files can be obtained from Enabling SSL Connection.
- CDC Log Caching: Mining the source database's incremental logs, this feature allows multiple tasks to share incremental logs from the source database, avoiding redundant reads and thus significantly reducing the load on the source database during incremental synchronization. Upon enabling this feature, an external storage should be selected to store the incremental log.
- Contain table: The default option is All, which includes all tables. Alternatively, you can select Custom and manually specify the desired tables by separating their names with commas (,).
- Exclude tables: Once the switch is enabled, you have the option to specify tables to be excluded. You can do this by listing the table names separated by commas (,) in case there are multiple tables to be excluded.
- Agent settings: Defaults to Platform automatic allocation, you can also manually specify an agent.
- Model load time: If there are less than 10,000 models in the data source, their information will be updated every hour. But if the number of models exceeds 10,000, the refresh will take place daily at the time you have specified.
Click Connection Test, and when passed, click Save.
tipIf the connection test fails, follow the prompts on the page to fix it.
See alsoβ
This section describes the issues you may encounter when using the Change Data Capture (CDC) feature. For more information, please refer to the Microsoft documentation.
Clean Change Data Capture Log
SQL Server does not automatically clean up incremental data logs and requires the following settings to open the cleanup task.
-- The unit of retention is in minutes, and the cleanup cycle is set to 2 days here.
USE AdventureWorks2012;
GO
EXECUTE sys.sp_cdc_change_job
@job_type = N'cleanup',
@retention = 2880;
GOIf you perform a DDL operation (such as adding fields) on the fields of the incremental synchronization table, you need to perform the following operation to restart the CDC, otherwise, the data cannot be synchronized or an error is reported.
-- Disable the CDC for table
go
EXEC sys.sp_cdc_disable_table
@source_schema = N'schema_name',
@source_name = N'table_name',
@capture_instance = N'Schema_Table'
go
// The capture_instance is usually concatenated in the format of schema_table.
// You can use the following command to query the actual value.
exec sys.sp_cdc_help_change_data_capture
@source_schema = N'schema_name',
@source_name = N'table_name';
-- Enable the CDC for table
use database_name
go
EXEC sys.sp_cdc_enable_table
@source_schema = N'schemas',
@source_name = N'table_name',
@role_name = N'role_name'
go
Turn on Global Change Data Capture
-- Replace TAPDATA with your database name
-- Replace INSURANCE with your schema name
USE TAPDATA
GO
EXEC sys.sp_cdc_enable_db
GO
declare @table_name varchar(100)
declare @database_name varchar(100)
declare @schema_name varchar(100)
set @database_name = 'TAPDATA'
set @schema_name = 'INSURANCE'
declare my_cursor cursor for SELECT TABLE_NAME
FROM TAPDATA.INFORMATION_SCHEMA.TABLES
where TABLE_CATALOG = @database_name
and TABLE_SCHEMA = @schema_name;
open my_cursor
fetch next from my_cursor into @table_name
while @@FETCH_STATUS = 0
begin
begin try
exec sys.sp_cdc_enable_table
@source_schema = @schema_name,
@source_name = @table_name,
@role_name = NULL
end try
begin catch
print('[ERROR] ' + @table_name)
end catch
fetch next from my_cursor into @table_name
end
close my_cursor
deallocate my_cursorTurn off Global Change Data Capture
-- Replace TAPDATA with your database name
-- Replace INSURANCE with your schema name
USE TAPDATA
GO
declare @table_name varchar(100)
declare @database_name varchar(100)
declare @schema_name varchar(100)
set @database_name = 'TAPDATA'
set @schema_name = 'INSURANCE'
declare my_cursor cursor for SELECT TABLE_NAME
FROM TAPDATA.INFORMATION_SCHEMA.TABLES
where TABLE_CATALOG = @database_name
and TABLE_SCHEMA = @schema_name;
open my_cursor
fetch next from my_cursor into @table_name
while @@FETCH_STATUS = 0
begin
begin try
EXEC sys.sp_cdc_disable_table
@source_schema = @schema_name,
@source_name = @table_name,
@capture_instance = 'all';
end try
begin catch
print ('[ERROR] ' + @table_name)
end catch
fetch next from my_cursor into @table_name
end
close my_cursor
deallocate my_cursor
EXEC sys.sp_cdc_disable_db
GO