Create AO/AOCO Tables in Parallel and Refresh Materialized Views
SynxDB supports using the CREATE TABLE AS
statement to create Append-Optimized (AO) tables and Append-Optimized Column Oriented (AOCO) tables in parallel, and also supports parallel refreshing of materialized views based on such tables, thereby accelerating table creation and materialized view refreshing.
To use this parallel functionality, you need to set the system parameter enable_parallel
to ON
first.
Usage examples
Create AO/AOCO tables in parallel
Create the base table
t_p2
, usingWITH
in the table creation statement to specify the number of parallel workers.CREATE TABLE t_p2(c1 INT, c2 INT) WITH (parallel_workers=2) DISTRIBUTED BY (c1);
Insert data into the table and collect statistics on table
t_p2
.INSERT INTO t_p2 SELECT i, i+1 FROM generate_series(1, 10000000) i; ANALYZE t_p2;
Enable parallel processing and disable the GPORCA optimizer.
SET enable_parallel = ON; SET optimizer = OFF;
Use
CREATE TABLE AS
to create a table based ont_p2
. Parallel operators appear in the execution plan, indicating that the table is created in parallel.Create an AO table:
EXPLAIN(COSTS OFF) CREATE TABLE ctas_ao USING ao_row AS SELECT SUM(a.c2) AS c2, AVG(b.c1) AS c1 FROM t_p2 a JOIN t_p2 b ON a.c1 = b.c1 DISTRIBUTED BY (c2); QUERY PLAN ----------------------------------------------------------------- Redistribute Motion 1:3 (slice1; segments: 1) Hash Key: (sum(a.c2)) -> Finalize Aggregate -> Gather Motion 6:1 (slice2; segments: 6) -> Partial Aggregate -> Parallel Hash Join Hash Cond: (a.c1 = b.c1) -> Parallel Seq Scan on t_p2 a -> Parallel Hash -> Parallel Seq Scan on t_p2 b Optimizer: Postgres query optimizer (11 rows)
Create an AOCO table:
EXPLAIN(COSTS OFF) CREATE TABLE ctas_aoco USING ao_column AS SELECT SUM(a.c2) AS c2, AVG(b.c1) AS c1 FROM t_p2 a JOIN t_p2 b ON a.c1 = b.c1 DISTRIBUTED BY (c2); QUERY PLAN ----------------------------------------------------------------- Redistribute Motion 1:3 (slice1; segments: 1) Hash Key: (sum(a.c2)) -> Finalize Aggregate -> Gather Motion 6:1 (slice2; segments: 6) -> Partial Aggregate -> Parallel Hash Join Hash Cond: (a.c1 = b.c1) -> Parallel Seq Scan on t_p2 a -> Parallel Hash -> Parallel Seq Scan on t_p2 b Optimizer: Postgres query optimizer (11 rows)
Refresh materialized views in parallel
Create the base table
t_p
, usingWITH
in the table creation statement to specify the number of parallel workers.CREATE TABLE t_p(c1 INT, c2 INT) WITH (parallel_workers=8) DISTRIBUTED BY (c1);
Insert data into the table and collect statistics on table
t_p
.INSERT INTO t_p SELECT i, i+1 FROM generate_series(1, 10000000) i; ANALYZE t_p;
Create a materialized view
matv
that uses row storage (AO Row) or column storage (AO Column) based on the parameterao_row
. The materialized view content is an aggregated query result derived from tablet_p
.CREATE MATERIALIZED VIEW matv USING ao_row AS SELECT SUM(a.c2) AS c2, AVG(b.c1) AS c1 FROM t_p a JOIN t_p b ON a.c1 = b.c1 WITH NO DATA DISTRIBUTED BY (c2);
CREATE MATERIALIZED VIEW matv USING ao_column AS SELECT SUM(a.c2) AS c2, AVG(b.c1) AS c1 FROM t_p a JOIN t_p b ON a.c1 = b.c1 WITH NO DATA DISTRIBUTED BY (c2);
Enable parallel processing and disable the GPORCA optimizer.
SET enable_parallel = ON; SET optimizer = OFF;
Refresh the materialized view.
REFRESH MATERIALIZED VIEW matv;
You can use tools such as recording refresh duration to compare the time difference for refreshing materialized views between disabled and enabled parallel modes. You will see the acceleration achieved by parallel materialized view refresh.
Performance improvement demonstration
Parallel enabled |
Refresh AO table materialized view time |
Refresh AOCO table materialized view time |
CTAS Create AO table time |
CTAS Create AOCO table time |
---|---|---|---|---|
Non-parallel |
6.18 ms |
5.91 ms |
6.56 ms |
6.06 ms |
Parallel=4 |
2.83 ms |
2.81 ms |
2.37 ms |
2.48 ms |