Data Loading Overview

SynxDB loads external data by converting it into external tables via loading tools. Then, it reads from or writes to these external tables to complete the external data loading process.

Data loading process

The general process of loading data into SynxDB is as follows:

Evaluate the data loading scenario (such as data source location, data types, and data volume) and choose the appropriate loading tool.
Configure and activate the loading tool.
Create an external table, specifying the loading tool protocol, data source address, and data format in the CREATE EXTERNAL TABLE statement. For details, see Load External Data Using Foreign Table.
Once the external table is created, you can directly query the data using the SELECT statement or load the data into a table using the INSERT INTO SELECT statement.

Loading tools and scenarios

SynxDB provides multiple data loading solutions, allowing you to choose different methods based on the data source.

Loading method/tool	Data source	Data format	Parallel loading
Load Data Using COPY	Local file system Coordinator node (for single files) Segment nodes (for multiple files)	TXT CSV Binary	No
Load Data Using the file:// Protocol	Local file system (local Segment nodes, only accessible by superusers)	TXT CSV	Yes
gpfdist	Local host files or files accessible over the intranet	TXT CSV Any delimited text format supported by the `FORMAT` clause XML and JSON (converted to text via YAML configuration)	Yes
Bulk loading using gpload (using `gpfdist` as the underlying component)	Local host files or files accessible over the intranet	TXT CSV Any delimited text format supported by the `FORMAT` clause XML and JSON (converted to text via YAML configuration)	Yes
Create external Web tables	Data fetched from web services or any source accessible via command line	TXT CSV	Yes
Kafka FDW and Kafka Connector <load-data/load-data-using-kafka-connector>	Kafka	CSV JSON	Yes
Data Lake Connector	Public cloud object storage (for example, Amazon S3, QingCloud, Alibaba Cloud, Huawei Cloud, Tencent Cloud) Hadoop storage	CSV TEXT ORC PARQUET	Yes
Hive Connector with `data lake_fdw`	Hive data warehouse	CSV TEXT ORC PARQUET Iceberg Hudi	Yes
PXF	HDFS Hive HBase Relational databases via JDBC	Depends on the connector. Common formats include: - Text, CSV, JSON - Avro, Parquet, ORC	Yes
mysql_fdw	MySQL	MySQL table data	Yes