Skip to main content

Doris

Applicable EditionsTapData CloudTapData Cloud offers you cloud services that are suitable for scenarios requiring rapid deployment and low initial investment, helping you focus more on business development rather than infrastructure management. Free trial with TapData Cloud.TapData EnterpriseTapData Enterprise can be deployed in your local data center, making it suitable for scenarios with strict requirements on data sensitivity or network isolation. It can serve to build real-time data warehouses, enable real-time data exchange, data migration, and more.TapData CommunityTapData Community is an open-source data integration platform that provides basic data synchronization and transformation capabilities. This helps you quickly explore and implement data integration projects. As your project or business grows, you can seamlessly upgrade to TapData Cloud or TapData Enterprise to access more advanced features and service support.

Apache Doris is a new-generation open-source real-time data warehouse based on MPP architecture, with easier use and higher performance for big data analytics. TapData Cloud supports Doris as a source or target database to build data pipelines to help you quickly complete data flow in big data analytics scenarios.

Next, follow this article to connect a Doris data source on the TapData Cloud platform.

Supported Versions

Dorix 1.x, 2.x

Maturity Stage

Beta Data Source
The Beta Data Source is currently in public preview and has undergone thorough testing, including basic test cases and integration test cases. However, it has not yet completed the TapData certification test process.

Precautions

If you want to use Doris as the source database to synchronize incremental data changes, you need to create a data transformation task and select Incremental Synchronization Method as the Polling.

Preparations

  1. To create an account, log in to the Doris database and run the following commands.

    CREATE USER 'username'@'host' IDENTIFIED BY 'password';
    • username: Enter user name.
    • password: Enter password.
    • host: Which host can be accessed by the account, percent (%) means to allow all host.

    Example: Create an account named tapdata.

    CREATE USER 'tapdata'@'%' IDENTIFIED BY 'Tap@123456';
  2. Grant permissions to the account we just created, we recommend setting more granular permissions control based on business needs.

-- Replace the catalog_name, database_name, and username follow the tips below
GRANT SELECT_PRIV ON catalog_name.database_name.* TO 'username'@'%';
tip

Please replace the username, password, and host in the command above.

  • catalog_name: The name of the data catalog. The default name is internal. You can view the created data catalog through the SHOW CATALOGS command. For more information, see Multi Catalog.
  • database_name: Enter database name.
  • username: Enter user name.

Connect to Doris

  1. Log in to TapData Platform.

  2. In the left navigation panel, click Connections.

  3. On the right side of the page, click Create.

  4. In the pop-up dialog, search for and select Doris.

  5. On the page you are redirected to, follow the instructions below to fill in the connection information for Doris.

    Connect Doris

    • Basic Settings
      • Name: Fill in a unique name that has business significance.
      • Type: Doris is supported as a source or target database.
      • DB Address: The connection address of Doris.
      • Port: The query service port for Doris, the default port is 9030.
      • Enable HTTPS: Select whether to enable the certificate-free HTTPS connection feature.
      • HTTP/HTTPS Address: The HTTP protocol address of the FE service, including address and port(e.g. http://192.168.1.18:8040), the default port is 8030.
      • DB Name: database name, a connection corresponding to a database, if there are multiple databases, you need to create multiple connections.
      • User, Password: The database username and password.
    • Advanced Setting
      • Doris Catalog: The catalog of Doris, whose hierarchy is above the database. If you use the default catalog, you can leave it empty. For more information, see Multi-Catalog.
      • Other Connection String Parameters: Additionally connection parameters, empty by default.
      • Timezone: Defaults to the time zone used by the database, which you can also manually specify according to your business needs.
      • Agent Settings: Defaults to Platform automatic allocation, you can also manually specify an Agent.
      • Model Load Time: If there are less than 10,000 models in the data source, their information will be updated every hour. But if the number of models exceeds 10,000, the refresh will take place daily at the time you have specified.
      • Enable Heartbeat Table: This switch is supported when the connection type is set as the Source&Target or Source. TapData Cloud will generate a table named tapdata_heartbeat_table in the source database, which is used to monitor the source database connection and task health.
  6. Click Test Connection, and when passed, click Save.

    tip

    If the connection test fails, follow the prompts on the page to fix it.