Skip to main content

Hudi

Applicable EditionsTapData CloudTapData Cloud offers you cloud services that are suitable for scenarios requiring rapid deployment and low initial investment, helping you focus more on business development rather than infrastructure management. Free trial with TapData Cloud.TapData EnterpriseTapData Enterprise can be deployed in your local data center, making it suitable for scenarios with strict requirements on data sensitivity or network isolation. It can serve to build real-time data warehouses, enable real-time data exchange, data migration, and more.TapData CommunityTapData Community is an open-source data integration platform that provides basic data synchronization and transformation capabilities. This helps you quickly explore and implement data integration projects. As your project or business grows, you can seamlessly upgrade to TapData Cloud or TapData Enterprise to access more advanced features and service support.

Apache Hudi is a storage format for data lakes that provides the ability to update, delete, and consume change data on top of the Hadoop file system. TapData supports using Hudi as a target database to build data transfer pipelines.

Environment Requirements

The machine running the compute engine should have Hadoop environment variables configured. The Hadoop version should match the version installed on your server. You can check if your machine meets the requirements by running the hadoop -version command.

Supported Version

Hudi 0.11.0

Parameter Descriptions

  • Cluster Address: Format should be ip:port.
  • Database: Name of the database.
  • Kerberos Authentication
    • Keytab File: Upload the user.keytab file.
    • Configuration File: Upload the krb5.conf file.
    • Hive Principal Configuration: spark2x/hadoop.[hadoop.com@HADOOP.COM](mailto:hadoop.com@HADOOP.COM) (corresponds to the value of principal).
  • Account and Password: Fill in the database username and password.
  • Server-side Hadoop Configuration File: core-site.xml, usually located in the etc/Hadoop directory of the Hadoop installation on the server.
  • Server-side HDFS Configuration File: hdfs-site.xml, usually located in the etc/Hadoop directory of the Hadoop installation on the server.
  • Server-side Hive Configuration File: hive-site.xml, usually located in the configuration file directory of the Hive installation on the server.
  • Connection Parameters: sasl.qop=auth-conf;auth=KERBEROS