Skip to main content

XML

Applicable EditionsTapData CloudTapData Cloud offers you cloud services that are suitable for scenarios requiring rapid deployment and low initial investment, helping you focus more on business development rather than infrastructure management. Free trial with TapData Cloud.TapData EnterpriseTapData Enterprise can be deployed in your local data center, making it suitable for scenarios with strict requirements on data sensitivity or network isolation. It can serve to build real-time data warehouses, enable real-time data exchange, data migration, and more.TapData CommunityTapData Community is an open-source data integration platform that provides basic data synchronization and transformation capabilities. This helps you quickly explore and implement data integration projects. As your project or business grows, you can seamlessly upgrade to TapData Cloud or TapData Enterprise to access more advanced features and service support.

Extensible Markup Language (XML) lets you define and store data in a shareable manner. XML supports information exchange between computer systems such as websites, databases, and third-party applications.

TapData supports reading XML files stored on local, FTP, SFTP, SMB, or S3FS to meet a variety of data flow needs. This article describes how to connect to XML data sources on TapData Cloud.

Precautions

  • A connection only refers to a model (table). If there are multiple models, you need to create connections for each one separately.
  • XML data sources are only available for data transformation tasks.
  • By default, TapData checks for changes (such as adding or modifying) to the XML file every minute. TapData synchronizes all files to accurately update the condition field when changes are detected. However, synchronous deletion of files or data is not supported.
  • Data types that can be used in XML files include ARRAY, BOOLEAN, DATETIME, INTEGER, NUMBER, OBJECT, STRING, and TEXT.

Preparations

Before establishing the connection, it is essential to follow the necessary preparations outlined in the article. These preparations may include authorizing an account and performing other relevant steps to ensure a smooth and secure connection. This article describes the steps to prepare for creating an XML data source connection.

Select the option to read based on the location of the XML file.

Stored on Local/FTP/SFTP/SMB

When you store files on TapData Agent's device, it can be used as a data source, and the preparations are as follows:

  1. Log in to TapData Agent's device.
  2. Create a folder first (e.g., /root/files for the root user) for easier file management.
  3. Save the files to the folder that you created.

When configuring the data source later, all you need to do is specify the file path.

Stored on Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. With Amazon S3, you can store and retrieve any size of data from anywhere on the web at any time. When your files are stored on Amazon S3, you need to obtain the user's access key, S3 bucket name, file path, and other necessary information.

  1. Log in to Amazon IAM Console.

  2. Create a user and grant them access.

    1. In the left navigation bar, select Access management > Users.

    2. On the right side of the page, click Add users.

    3. Fill in the username and click Next.

      The user name can have up to 64 characters. Valid characters: A-Z, a-z, 0-9, and \+ =,. @ _-

    4. In the Permissions options area, select Attach policies directly, then search for and select the AmazonS3ReadOnlyAccess policy.

      Grant AmazonS3ReadOnlyAccess

    5. Click Next, and then click Create user.

  3. Create an access key for the user.

    1. On the user list page, find and click the user you just created.

    2. Click the Security credentials tab, and then click Create access key in the Access keys area.

      Create Access Key

    3. Select Third-party servicesand click Next.

    4. Fill in the description tab and click Create access key.

    5. Access key and secret key can be viewed or downloaded on the page.

      Obtain Access Key

      tip

      To ensure the security of your account, please keep your access key secure. This is the only time that the secret access key can be viewed or downloaded. You cannot recover it later. However, you can create a new access key any time.

  4. Retrieve the bucket's region code.

    1. Log in to Amazon S3 Console.

    2. On the bucket list page, find the target bucket and view its region code.

      Get Region Code

Stored on OSS

Object Storage Service (OSS) is a secure, cost-effective, and high-durability cloud storage service provided by Alibaba Cloud. When your files are stored on Alibaba Cloud OSS, you need to obtain the user's access key, bucket name, file path, and other necessary information.

  1. Log in to Alibaba Cloud RAM Console.

  2. Create a user and obtain an AccessKey.

    1. In the left navigation bar, select Identities > Users.

    2. Click Create User.

    3. On the redirected page, fill in the name, display name, select the OpenAPI Access, and click OK.

      Create User

    4. After the user creation is complete, click Download CSV file that contains the AccessKey information.

      tip

      To ensure the security of your account, please keep your access key secure.

  3. Grant permission to the user.

    1. On the user list page, find and click the user you just created.

    2. Click the Permissions tab, and then click Grant Permission.

    3. In the panel on the right, select the authorization scope, then search for and select the AliyunOSSReadOnlyAccess policy.

      Grant OSS Read Permission

    4. Click OK, and then click Complete.

  4. Obtains the OSS access domain name (Endpoint).

    1. Log in to Alibaba Cloud OSS Console.

    2. Locate and click on the target bucket.

    3. Click on the Overview tab on the left, then scroll down to the Port area and find the Endpoint for Access Over Internet.

      Get Endpoint

Connect to XML

  1. Log in to TapData Platform.

  2. In the left navigation panel, click Connections.

  3. On the right side of the page, click Create.

  4. In the pop-up dialog, select XML.

  5. On the page that you are redirected to, follow the instructions below to fill in the connection information for XML.

    Connect to XML

    tip

    For more about how to obtain the information needed for each protocol, see Preparations.

    • Connection name: Fill in a unique name that has business significance.
    • Connection type: Currently only supported as a Source.
    • File Protocol: Select the following protocol based on the location of the XML file. This article uses S3FS (Amazon S3 bucket) as an illustration of the operation process.
      • Local: The file is stored on the device that deployed the TapData Agent. After selecting this item, you also need to fill in the file path.
      • FTP (File Transfer Protocol): The file is stored on the FTP server. After selecting this item, you also need to fill in the address, port, username, password, file path, and other information of the FTP server.
      • SFTP (Secure File Transfer Protocol): The file is stored on the SFTP server. After selecting this item, you also need to fill in the address, port, user name, password, file path, and other information of the SFTP server.
      • SMB (Server Message Block Protocol): The file is stored on the SMB server and is compatible with 1.x, 2.x, 3.x. After selecting this item, you also need to fill in the address, username, password, file path, and other information of the SMB server.
      • S3FS (File System According to S3 Protocol): The file is stored on the Amazon S3 bucket. After selecting this item, you need to fill in the information such as Accesskey, Secretkey, Endpoint (fixed at s3.amazonaws.com), Bucket, and file path.
      • OSS (Object Storage Service): The file is stored on the Alibaba Cloud Object Storage. After selecting this item, you need to fill in the Accesskey, Secretkey, Endpoint, Bucket, and file path.
    • Agent settings: Defaults to Platform automatic allocation, you can also manually specify an agent.
    • Model load time: When the number of models in the data source is less than 10,000, the model information is refreshed every hour; if the model data exceeds 10,000, the model information is refreshed every day at the time you specify.
  6. Click Test Connection, and when passed, click Save.

    tip

    If the connection test fails, follow the prompts on the page to fix it.