Performance Testing
TapData Enterprise can be deployed in your local data center, making it suitable for scenarios with strict requirements on data sensitivity or network isolation. It can serve to build real-time data warehouses, enable real-time data exchange, data migration, and more.This document aims to detail the methods and steps for conducting performance tests on TapData. We will explore how to accurately assess TapData's data processing capabilities, response times, and system stability under various conditions. This helps you understand TapData's performance under different loads, allowing for better resource planning and configuration optimization.
Testing Environmentβ
- Operating System: CentOS 7 64-bit
- CPU: 12 cores
- Memory: 96 GB
Testing Methodsβ
To ensure a comprehensive and accurate evaluation of TapData's performance, we will use the following testing methods:
Synchronization Task Creation.
Create data synchronization tasks according to the planned test types, including full and incremental synchronizations, to fully assess synchronization performance.
Simulate High-Intensity Data Writes.
Continuously execute high-intensity data write operations through automation scripts at the source database end, ensuring the data flow into the source database meets the test requirements.
Monitor Data Synchronization at the Target End.
Observe and record the data changes and writing speed at the target database end to confirm if the target can effectively and accurately handle data synchronization.
Data Consistency Verification.
Ensure data consistency between the source and target databases under high load, including verifying data integrity and synchronization accuracy, ensuring no data loss or erroneous synchronizations.
These steps will allow us to comprehensively evaluate TapData's data processing capabilities and stability under high load conditions. Additionally, this method will help identify potential performance bottlenecks, providing a basis for subsequent optimizations.
Test Metrics Explanationβ
Metric | Description |
---|---|
Sync RPS | Records Per Second measures the speed and performance of data transfer by indicating the number of records processed each second. |
99th Latency | 99% of operations have a maximum delay of this value, measured in milliseconds. Assumes a value of 100 milliseconds means 99% of requests are processed within this time. |
95th Latency | 95% of operations have a maximum delay of this value, measured in milliseconds. |
Average Latency | Average latency distribution, measured in milliseconds. |
Test Resultsβ
To ensure the accuracy and applicability of performance test results, we will conduct tests across various database architectures and in both homogeneous and heterogeneous data environments. This diverse testing approach helps comprehensively assess TapData's performance across different scenarios and connections.
Source Database: MongoDBβ
Environment Descriptionβ
MongoDB Instance: A sharded cluster with 3 nodes, including 12 fields in the collection structure as shown below, with no indexes other than
_id
. An automation script will perform data insertion operations at a rate of 11,000 rows per second, with incremental multi-thread writing enabled (4 threads).ElasticSearch Instance: A single-node instance running in Docker, with initial and maximum heap memory set to 4GB.
Redis Instance: A single-node instance running in Docker, with 12 cores and 48GB of memory.
Test Resultsβ
Results of real-time synchronization from a MongoDB source to different target databases are as follows:
Target Database | Incremental Rows Written | Total Increment Time | Average Sync RPS | 99th Latency | 95th Latency | Average Latency |
---|---|---|---|---|---|---|
MongoDB | 5.1 million | 438 seconds | 11,664 | 2,000 ms | 1,900 ms | 1,163 ms |
ElasticSearch | 3.1 million | 261 seconds | 11,608 | 1,700 ms | 1,500 ms | 980 ms |
Redis | 6.7 million | 577 seconds | 11,592 | 1,600 ms | 1,300 ms | 781 ms |
Source Database: MySQLβ
Environment Descriptionβ
MySQL Instance: Machine equipped with 12 cores and 48GB memory, with no indexes other than
CUSTOMER_ID
. Table structure as shown:CREATE TABLE customer(
CUSTOMER_ID bigint PRIMARY KEY,
CITY VARCHAR(100),
AGE INT,
FIRST_NAME VARCHAR(150),
LAST_NAME VARCHAR(150),
DATE_OF_BIRTH DATETIME,
JOB VARCHAR(150),
EMAIL VARCHAR(100),
ZIP VARCHAR(10),
PHONE VARCHAR(40)
);Data will continuously be inserted via a stored procedure to simulate incremental data generation:
DELIMITER //
CREATE PROCEDURE CUSTOMER_INSERT_DATA(IN DATA_COUNT INT,IN BATCH_SIZE INT)
BEGIN
DECLARE i INT default 1;
DECLARE MAX_ID INT;
select max(test.customer.CUSTOMER_ID) from test.customer into MAX_ID;
IF MAX_ID IS NULL THEN
SET MAX_ID = 0;
END IF;
SET autocommit =0;
WHILE i <= DATA_COUNT do
INSERT INTO customer(CUSTOMER_ID, CITY, AGE, FIRST_NAME, LAST_NAME, DATE_OF_BIRTH, JOB, EMAIL, ZIP, PHONE)
VALUES (MAX_ID+i,'Oppenheim',18,'Lambert','Sauer',now(),'Vertreter / Vertreterin','rushland2003@live.com','13480','+49-4611-41132053');
IF(mod(i,BATCH_SIZE)=0)
THEN commit;
END IF;
SET i = i+1;
END WHILE ;
Commit ;
END //
DELIMITER ;Kafka Instance: Running in Docker, allocated memory is 48GB.
Test Results:
Target Database | Incremental Rows Written | Total Increment Time | Average Sync RPS | 99th Latency | 95th Latency | Average Latency |
---|---|---|---|---|---|---|
MySQL | 5 million | 281 seconds | 17,795 | 1,400 ms | 1,200 ms | 712 ms |
Kafka | 2 million | 184 seconds | 10,892 | 1,100 ms | 1,100 ms | 690 ms |
Source Database: PostgreSQLβ
Environment Descriptionβ
In this test, the PostgreSQL instance is deployed in Docker with 48GB of available memory, with no indexes other than CUSTOMER_ID
. Incremental log information is obtained using WAL2JSON, and the table structure is as follows:
CREATE TABLE customer (
CUSTOMER_ID bigint PRIMARY KEY,
CITY VARCHAR(100),
AGE INT,
FIRST_NAME VARCHAR(150),
LAST_NAME VARCHAR(150),
DATE_OF_BIRTH TIMESTAMP,
JOB VARCHAR(150),
EMAIL VARCHAR(100),
ZIP VARCHAR(10),
PHONE VARCHAR(40)
);
Data will be continuously inserted using a custom PostgreSQL function to simulate incremental data generation, inserting approximately 10,000 rows per second:
CREATE OR REPLACE FUNCTION CUSTOMER_INSERT_DATA6(DATA_COUNT INT, BATCH_SIZE INT) RETURNS VOID AS $$
DECLARE
i INT := 1;
MAX_ID INT;
BEGIN
SELECT MAX(CUSTOMER_ID) FROM customer6 INTO MAX_ID;
IF MAX_ID IS NULL THEN
MAX_ID := 0;
END IF;
WHILE i <= DATA_COUNT LOOP
INSERT INTO customer6 (CUSTOMER_ID, CITY, AGE, FIRST_NAME, LAST_NAME, DATE_OF_BIRTH, JOB, EMAIL, ZIP, PHONE)
VALUES (MAX_ID+i, 'Oppenheim', 18, 'Lambert', 'Sauer', CURRENT_TIMESTAMP, 'Vertreter / Vertreterin', 'rushland2003@live.com', '13480', '+49-4611-41132053');
i := i + 1;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Test Resultsβ
Target Database | Incremental Rows Written | Total Increment Time | Average Sync RPS | 99th Latency | 95th Latency | Average Latency |
---|---|---|---|---|---|---|
PostgreSQL | 1.45 million | 121 seconds | 11,946 | 1,600 ms | 1,500 ms | 1,060 ms |
Source Database: Redisβ
Environment Descriptionβ
In this test, the Redis instance is deployed as a 3-node cluster, each node having 4 cores and 16GB of memory. An automation script will perform data insertion operations at a rate of 30,000 rows per second. Key structure is shown below:
9000) "8273618"
9001) "7690169"
9002) "1854874"
Test Resultsβ
Target Database | Incremental Rows Written | Total Increment Time | Average Sync RPS | 99th Latency | 95th Latency | Average Latency |
---|---|---|---|---|---|---|
Redis | 29.4 million | 1806 seconds | 16,287 | 800 ms | 200 ms | 140 ms |