Skip to main content

Excel

Applicable EditionsTapData CloudTapData Cloud offers you cloud services that are suitable for scenarios requiring rapid deployment and low initial investment, helping you focus more on business development rather than infrastructure management. Free trial with TapData Cloud.TapData EnterpriseTapData Enterprise can be deployed in your local data center, making it suitable for scenarios with strict requirements on data sensitivity or network isolation. It can serve to build real-time data warehouses, enable real-time data exchange, data migration, and more.TapData CommunityTapData Community is an open-source data integration platform that provides basic data synchronization and transformation capabilities. This helps you quickly explore and implement data integration projects. As your project or business grows, you can seamlessly upgrade to TapData Cloud or TapData Enterprise to access more advanced features and service support.

Excel is a wide range of data statistics and data analysis software. TapData supports reading Excel files stored on local, FTP, SFTP, SMB, or S3FS to meet a variety of data flow needs.

This article describes how to connect to Excel data source on TapData Cloud.

Supported File Formats​

Supports both XLS and XLSX formats and forms that include merged cells and formulas.

Precautions​

  • A connection only refers to a model (table). If there are multiple models, you need to create connections for each one separately.
  • Excel data sources are only available for data transformation tasks.
  • To avoid excessive memory usage on machines, it is recommended that Excel files do not exceed 512 MB.
  • By default, TapData checks for changes (such as adding or modifying) to the XML file every minute. TapData synchronizes all files to accurately update the condition field when changes are detected. However, synchronous deletion of files or data is not supported.
  • Data types that can be used in Excel files include BOOLEAN, DATE, DOUBLE, STRING, and TEXT.

Preparations​

Before establishing the connection, it is essential to follow the necessary preparations outlined in the article. These preparations may include authorizing an account and performing other relevant steps to ensure a smooth and secure connection. This article describes the steps to prepare for creating an Excel data source connection.

Select the option to read based on the location of the Excel file.

Stored on Local/FTP/SFTP/SMB​

When you store files on TapData Agent's device, it can be used as a data source, and the preparations are as follows:

  1. Log in to TapData Agent's device.
  2. Create a folder first (e.g., /root/files for the root user) for easier file management.
  3. Save the files to the folder that you created.

When configuring the data source later, all you need to do is specify the file path.

Stored on Amazon S3​

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. With Amazon S3, you can store and retrieve any size of data from anywhere on the web at any time. When your files are stored on Amazon S3, you need to obtain the user's access key, S3 bucket name, file path, and other necessary information.

  1. Log in to Amazon IAM Console.

  2. Create a user and grant them access.

    1. In the left navigation bar, select Access management > Users.

    2. On the right side of the page, click Add users.

    3. Fill in the username and click Next.

      The user name can have up to 64 characters. Valid characters: A-Z, a-z, 0-9, and \+ =,. @ _-

    4. In the Permissions options area, select Attach policies directly, then search for and select the AmazonS3ReadOnlyAccess policy.

      Grant AmazonS3ReadOnlyAccess

    5. Click Next, and then click Create user.

  3. Create an access key for the user.

    1. On the user list page, find and click the user you just created.

    2. Click the Security credentials tab, and then click Create access key in the Access keys area.

      Create Access Key

    3. Select Third-party servicesand click Next.

    4. Fill in the description tab and click Create access key.

    5. Access key and secret key can be viewed or downloaded on the page.

      Obtain Access Key

      tip

      To ensure the security of your account, please keep your access key secure. This is the only time that the secret access key can be viewed or downloaded. You cannot recover it later. However, you can create a new access key any time.

  4. Retrieve the bucket's region code.

    1. Log in to Amazon S3 Console.

    2. On the bucket list page, find the target bucket and view its region code.

      Get Region Code

Stored on OSS​

Object Storage Service (OSS) is a secure, cost-effective, and high-durability cloud storage service provided by Alibaba Cloud. When your files are stored on Alibaba Cloud OSS, you need to obtain the user's access key, bucket name, file path, and other necessary information.

  1. Log in to Alibaba Cloud RAM Console.

  2. Create a user and obtain an AccessKey.

    1. In the left navigation bar, select Identities > Users.

    2. Click Create User.

    3. On the redirected page, fill in the name, display name, select the OpenAPI Access, and click OK.

      Create User

    4. After the user creation is complete, click Download CSV file that contains the AccessKey information.

      tip

      To ensure the security of your account, please keep your access key secure.

  3. Grant permission to the user.

    1. On the user list page, find and click the user you just created.

    2. Click the Permissions tab, and then click Grant Permission.

    3. In the panel on the right, select the authorization scope, then search for and select the AliyunOSSReadOnlyAccess policy.

      Grant OSS Read Permission

    4. Click OK, and then click Complete.

  4. Obtains the OSS access domain name (Endpoint).

    1. Log in to Alibaba Cloud OSS Console.

    2. Locate and click on the target bucket.

    3. Click on the Overview tab on the left, then scroll down to the Port area and find the Endpoint for Access Over Internet.

      Get Endpoint

Connect to Excel​

  1. Log in to TapData Platform.

  2. In the left navigation panel, click Connections.

  3. On the right side of the page, click Create.

  4. In the pop-up dialog, select Excel.

  5. On the page that you are redirected to, follow the instructions below to fill in the connection information for Excel.

    Connect to Excel

    tip

    For more about how to obtain the information needed for each file protocol, see Preparations.

    • Connection name: Fill in a unique name that has business significance.
    • Connection type: Currently only supported as a Source.
    • File Protocol: Select the following protocol based on the location of the Excel files. This article uses S3FS (Amazon S3 bucket) as an illustration of the operation process.
      • Local: The file is stored on the device that deployed the TapData Agent. After selecting this item, you also need to fill in the file path.
      • FTP (File Transfer Protocol): The file is stored on the FTP server. After selecting this item, you also need to fill in the address, port, username, password, file path, and other information of the FTP server.
      • SFTP (Secure File Transfer Protocol): The file is stored on the SFTP server. After selecting this item, you also need to fill in the address, port, user name, password, file path, and other information of the SFTP server.
      • SMB (Server Message Block Protocol): The file is stored on the SMB server and is compatible with 1.x, 2.x, 3.x. After selecting this item, you also need to fill in the address, username, password, file path, and other information of the SMB server.
      • S3FS (file system according to S3 protocol): The file is stored on the Amazon S3 bucket. After selecting this item, you need to fill in the information such as Accesskey, Secretkey, Endpoint (fixed at s3.amazonaws.com), Bucket, and file path.
      • OSS (Object Storage Service): The file is stored on the Alibaba Cloud Object Storage. After selecting this item, you need to fill in the Accesskey, Secretkey, Endpoint, Bucket, and file path.
    • Agent settings: Defaults to Platform automatic allocation, you can also manually specify an agent.
    • Model load time: When the number of models in the data source is less than 10,000, the model information is refreshed every hour; if the model data exceeds 10,000, the model information is refreshed every day at the time you specify.
  6. Click Test Connection, and when passed, click Save.

    tip

    If the connection test fails, follow the prompts on the page to fix it.