Execute Queries in Parallel

This document describes the use cases, methods, limitations, and frequently asked questions for executing queries in parallel in SynxDB. SynxDB introduces the parallel query feature, where the number of computing nodes (including the SeqScan operator) is dynamically adjusted based on the data volume during query execution. Parallel query aims to improve query performance by utilizing multiple CPU cores to process a single query.

Use cases

Deploy a small number of segments on a single physical machine and improve performance by dynamically adjusting the degree of parallelism instead of deploying a large number of segments.
Enabling operator parallelism provides a performance advantage when the host CPU and disk load are not high.

How to use

SynxDB supports parallel queries on AO/AOCO tables and Heap tables.

Query heap tables in parallel

Before enabling the parallel query feature, you need to disable the GPORCA optimizer.
```
SET enable_parallel = ON;
SET optimizer = OFF;
```

Set the maximum degree of parallelism.

  -- The setting should consider the number of CPU cores and segments.
  SET max_parallel_workers_per_gather = 4;

Query example:

.. code:: sql

CREATE TABLE t1 (c1 int,c2 int, c3 int, c4 box) distributed by (c1);
INSERT INTO t1 SELECT x, 2*x, 3*x, box('6,6,6,6') FROM generate_series(1,1000000) AS x;
SELECT count(*) from t1;

Query AO/AOCO tables in parallel

Before enabling the parallel query feature, you need to disable the GPORCA optimizer.
```
SET enable_parallel = ON;
SET optimizer = OFF;
```

Set the maximum degree of parallelism.

-- The setting should consider the number of CPU cores and segments.
SET max_parallel_workers_per_gather = 4;

-- Set how many files to insert data into within one transaction. A larger value might decrease performance, especially for AOCO tables.
SET gp_appendonly_insert_files = 8;

-- Switch to the next file for data insertion every 100,000 rows. This can be adjusted to distribute data as evenly as possible across multiple files.
SET gp_appendonly_insert_files_tuples_range = 100000;

Insert and query.

CREATE TABLE ao(x INT) USING ao_row;
INSERT INTO ao SELECT i FROM generate_series(1, 1000000) i;
-- It is recommended to update statistics first.
ANALYZE ao;
SELECT COUNT(*) FROM ao;

Parameter descriptions

Parameter	Description	Default	Required	Example
`enable_parallel`	Whether to enable the parallel feature.	`OFF`	Yes	`SET enable_parallel = ON;`
`optimizer`	Whether to enable the GPORCA optimizer.	`ON`	Yes	`SET optimizer = OFF;`
`gp_appendonly_insert_files`	The number of files to insert data into within one transaction.	`4`	No	`SET gp_appendonly_insert_files = 8;`
`gp_appendonly_insert_files_tuples_range`	The number of rows after which to switch to the next file for data insertion.	`100000`	No	`SET gp_appendonly_insert_files_tuples_range = 100000;`

Frequently asked questions

Currently, parallel execution is supported for queries containing the following operators. SynxDB does not currently support queries with other operators.
```
sequence scan
index scan
index only scan
bitmap heap scan
append
hash join
nestloop join
merge join
```
Parallel query does not always improve query performance. An excessively high degree of parallelism can cause an overload, leading to a decrease in performance.
Enabling parallelism multiplies memory consumption, which might lead to “out of memory” errors.