Data Loading Overview

SynxDB loads external data by converting it into external tables via loading tools. Then, it reads from or writes to these external tables to complete the external data loading process.

Data loading process

The general process of loading data into SynxDB is as follows:

  1. Evaluate the data loading scenario (such as data source location, data types, and data volume) and choose the appropriate loading tool.

  2. Configure and activate the loading tool.

  3. Create an external table, specifying the loading tool protocol, data source address, and data format in the CREATE EXTERNAL TABLE statement. For details, see Load External Data Using Foreign Table.

  4. Once the external table is created, you can directly query the data using the SELECT statement or load the data into a table using the INSERT INTO SELECT statement.

Loading tools and scenarios

SynxDB provides multiple data loading solutions, allowing you to choose different methods based on the data source.

Loading method/tool

Data source

Data format

Parallel loading

Load Data Using COPY

Local file system
  • Coordinator node (for single files)

  • Segment nodes (for multiple files)

  • TXT

  • CSV

  • Binary

No

Load Data Using the file:// Protocol

Local file system (local Segment nodes, only accessible by superusers)

  • TXT

  • CSV

Yes

gpfdist

Local host files or files accessible over the intranet

  • TXT

  • CSV

  • Any delimited text format supported by the FORMAT clause

  • XML and JSON (converted to text via YAML configuration)

Yes

Bulk loading using gpload (using gpfdist as the underlying component)

Local host files or files accessible over the intranet

  • TXT

  • CSV

  • Any delimited text format supported by the FORMAT clause

  • XML and JSON (converted to text via YAML configuration)

Yes

Create external Web tables

Data fetched from web services or any source accessible via command line

  • TXT

  • CSV

Yes

Kafka FDW and Kafka Connector <load-data/load-data-using-kafka-connector>

Kafka

  • CSV

  • JSON

Yes

Data Lake Connector

  • Public cloud object storage (for example, Amazon S3, QingCloud, Alibaba Cloud, Huawei Cloud, Tencent Cloud)

  • Hadoop storage

  • CSV

  • TEXT

  • ORC

  • PARQUET

Yes

Hive Connector with data lake_fdw

Hive data warehouse

  • CSV

  • TEXT

  • ORC

  • PARQUET

  • Iceberg

  • Hudi

Yes

PXF

  • HDFS

  • Hive

  • HBase

  • Relational databases via JDBC

Depends on the connector. Common formats include: - Text, CSV, JSON - Avro, Parquet, ORC

Yes

mysql_fdw

MySQL

MySQL table data

Yes