Use Automatic Materialized Views for Query Optimization
SynxDB supports automatically using materialized views to process some or all queries (called “Answer Query Using Materialized Views”, or “AQUMV”) during the query planning phase. This feature is suitable for queries on large tables and can greatly reduce query processing time. AQUMV uses incremental materialized views (IMVs) because IMVs usually keep the latest data when related tables have write operations.
Usage scenarios
Aggregation queries on large data sets: For queries that need to aggregate results from millions of records, AQUMV can significantly reduce query time.
Frequently updated large tables: In an environment where data is frequently updated, using IMV can ensure that the query results are real-time and accurate.
Complex calculations: For queries with complex calculations (such as square root and absolute value calculations), AQUMV can speed up the queries by pre-calculating these values in materialized views.
Implementation
AQUMV implements query optimization by equivalently transforming the query tree.
SynxDB automatically uses materialized views only when the table query meets the following conditions:
Materialized views must contain all the rows required by the query expression.
If the materialized view contains more rows than the query, you might need to add additional filters.
All output expressions must be able to be calculated from the output of the view.
The output expression can fully or partially match the target list of the materialized view.
When there are multiple valid materialized view candidates, or the cost of querying from the materialized view is higher than querying directly from the original table (for example, the original table has indexes), you can let the planner make the best choice based on cost estimates.
Comparison with dynamic tables
AQUMV and dynamic tables have these differences:
Feature |
Dynamic table |
AQUMV |
---|---|---|
Purpose |
A special table that automatically refreshes, processes data pipelines, and simplifies ETL. |
Uses materialized views to improve query efficiency and automatically rewrite queries. |
Base and structure |
Can be based on ordinary tables, external tables, materialized views. |
Based on materialized views, usually targeting a single table. |
Query rewrite |
Not supported |
Supports single-table rewrite |
Data refresh mechanism |
Users can define automatic refresh interval through SQL |
Requires manual refresh of materialized view data |
Restrictions
Only
SELECT
queries for a single relationship are supported, which is applicable to materialized view queries and the original queries.You need to explicitly enable the support for external tables through the
aqumv_allow_foreign_table
parameter.The support for aggregate queries have the following limitations: - Aggregates with
GROUP BY
are not supported. - TheHAVING
clause must be calculable from the target list of the materialized view.Currently, unsupported features include: subqueries, sorting of the original query (
ORDER BY
), joins (JOIN
), sublinks (SUBLINK
), grouping (GROUP BY
), window functions, common table expressions (CTE), distinct on (DISTINCT ON
), refreshing materialized views (REFRESH MATERIALIZED VIEW
), andCREATE AS
statements.
Usage examples
To enable AQUMV, you need to create a materialized view, turn off the GPORCA optimizer, and set the value of the system parameter enable_answer_query_using_materialized_views
to ON
. The following example compares the results of the same complex query without AQUMV and with AQUMV.
Turn off the GPORCA planner to use the Postgres-based planner.
SET optimizer TO off;
Create a table
aqumv_t1
.CREATE TABLE aqumv_t1(c1 INT, c2 INT, c3 INT) DISTRIBUTED BY (c1);
Insert data into the table and collect statistics from the table.
INSERT INTO aqumv_t1 SELECT i, i+1, i+2 FROM generate_series(1, 1000000) i; ANALYZE aqumv_t1;
Execute a query without enabling AQUMV. The query takes 7384.329 ms.
SELECT SQRT(ABS(ABS(c2) - c1 - 1) + ABS(c2)) FROM aqumv_t1 WHERE c1 > 30 AND c1 < 40 AND SQRT(ABS(c2)) > 5.8; sqrt ------------------- 66.08276253029821 96.24499799839839 85.91607978309961 66.16441400296897 66.32455532033675 95.8309518948453 (7 rows) Time: 7384.329 ms (00:07.384)
From the following query plan, you can see that the optimizer scans the table (
Seq Scan on aqumv_t1
).EXPLAIN(COSTS OFF) SELECT SQRT(ABS(ABS(c2) - c1 - 1) + ABS(c2)) FROM aqumv_t1 WHERE c1 > 30 AND c1 < 40 AND SQRT(ABS(c2)) > 5.8; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) -> Seq Scan on aqumv_t1 Filter: ((c1 > 30) AND (c1 < 40) AND (sqrt((abs(c2))::double precision) > '5.8'::double precision)) Optimizer: Postgres query optimizer (4 rows)
Create a materialized view
mvt1
based onaqumv_t1
and collect statistics on the view.CREATE INCREMENTAL MATERIALIZED VIEW mvt1 AS SELECT c1 AS mc1, c2 AS mc2, ABS(c2) AS mc3, ABS(ABS(c2) - c1 - 1) AS mc4 FROM aqumv_t1 WHERE c1 > 30 AND c1 < 40; ANALYZE mvt1;
Enable the AQUMV-related configuration parameter.
SET enable_answer_query_using_materialized_views = ON;
Now AQUMV is enabled. Execute the same query again, which takes only 45.701 ms.
SELECT SQRT(ABS(ABS(c2) - c1 - 1) + ABS(c2)) FROM aqumv_t1 WHERE c1 > 30 AND c1 < 40 AND SQRT(ABS(c2)) > 5.8; sqrt ------------------- 66.08276253029821 96.24499799839839 85.8309518948453 5.916079783099616 66.16441400296897 66.32455532033675 (7 rows) Time: 45.701 ms
From the following query plan, you can see that the optimizer does not scan the
aqumv_t1
table, but scans the materialized viewmvt1
(Seq Scan on public.mvt1
) instead.EXPLAIN(VERBOSE, COSTS OFF) SELECT SQRT(ABS(ABS(c2) - c1 - 1) + ABS(c2)) FROM aqumv_t1 WHERE c1 > 30 AND c1 < 40 AND SQRT(ABS(c2)) > 5.8; QUERY PLAN -------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) Output: (sqrt(((mc4 + mc3))::double precision)) -> Seq Scan on public.mvt1 Output: sqrt(((mc4 + mc3))::double precision) Filter: (sqrt((mvt1.mc3)::double precision) > '5.8'::double precision) Settings: enable_answer_query_using_materialized_views = 'on', optimizer = 'off' Optimizer: Postgres query optimizer (7 rows)
In the above example, the query takes 7384.329 ms without AQUMV and without using the materialized views. In contrast, the same query takes only 45.701 ms with AQUMV enabled and using the materialized view. This means that the materialized view pre-calculates and stores relevant calculation result, so that the view only contains rows that meet the specific condition (
c1 > 30 AND c1 < 40
).Therefore, the above table query
SELECT SQRT(ABS(ABS(c2) - c1 - 1) + ABS(c2)) FROM aqumv_t1 WHERE c1 > 30 AND c1 < 40 AND SQRT(ABS(c2)) > 5.8;
is equivalent to querying from the materialized viewSELECT SQRT(ABS(ABS(c2) - c1 - 1) + ABS(c2)) FROM aqumv_t1 WHERE c1 > 30 AND c1 < 40 AND SQRT(ABS(c2)) > 5.8;
.
When the same query is executed, the data can be obtained directly from the materialized view, rather than from the original table. In this way, AQUMV can significantly improve query performance, especially when dealing with large data volumes and complex calculations.
Use materialized views to query external tables
SynxDB supports querying external tables using materialized views. In general, querying external tables faces challenges like data synchronization latency and network bottlenecks. Materialized views can cache query results to avoid accessing external data sources every time, which improves query speed.
Here is an example:
Create an external table and its corresponding materialized view.
CREATE READABLE EXTERNAL TABLE aqumv_ext_r(id int) LOCATION ('demoprot://aqumvtextfile.txt') FORMAT 'text'; CREATE MATERIALIZED VIEW aqumv_ext_mv AS SELECT * FROM aqumv_ext_r;
In the above external table creation statement,
demoprot://aqumvtextfile.txt
is an example file path and example protocol.Enable the AQUMV support in a transaction for the external table:
BEGIN; SET optimizer = OFF; SET aqumv_allow_foreign_table = ON; SET enable_answer_query_using_materialized_views = ON;
Check the query plan to see that the following query uses the materialized view instead of directly scanning the external table:
EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM aqumv_ext_r; QUERY PLAN ------------------------------------------------------------------ Gather Motion 3:1 (slice1; segments: 3) Output: id -> Seq Scan on aqumv.aqumv_ext_mv Output: id Optimizer: Postgres query optimizer
Create an index on the materialized view to further optimize queries:
CREATE INDEX ON aqumv_ext_mv(id);
When the conditional query is executed, the query plan shows the use of the index:
EXPLAIN (COSTS OFF, VERBOSE) SELECT * FROM aqumv_ext_r WHERE id = 5; QUERY PLAN ---------------------------------------------------------------------------- Gather Motion 1:1 (slice1; segments: 1) Output: id -> Index Only Scan using aqumv_ext_mv_id_idx on aqumv.aqumv_ext_mv Output: id Index Cond: (aqumv_ext_mv.id = 5) Optimizer: Postgres query optimizer
Commit the transaction:
COMMIT;
When this feature is enabled, queries on external tables automatically uses materialized views, including leveraging indexes on materialized views to accelerate queries.
Note
Because external table data might change outside the database and the planner cannot detect these changes, AQUMV is disabled for external tables by default.
It is recommended to enable
aqumv_allow_foreign_table
only when you are certain that external table data has not changed.For some external tables (like Iceberg and Hudi), access speed can be 10 times slower or more than internal tables. Using materialized views in these cases can greatly improve query performance.
Compared to importing external table data into internal tables, using materialized views avoids data loading latency and time overhead.
Aggregate query support
SynxDB supports optimizing queries that contain aggregate functions using materialized views. This feature can greatly improve query performance when dealing with aggregate operations on large datasets.
Aggregate query support
Create a table
t
.CREATE TABLE t(c1 INT, c2 INT, c3 INT) DISTRIBUTED BY (c1);
Insert data into the table and collect statistics.
INSERT INTO t SELECT i, i+1, i+2 FROM generate_series(1, 1000000) i; ANALYZE t;
Create a materialized view with aggregate functions.
CREATE MATERIALIZED VIEW mv AS SELECT sum(c1) AS mc1, count(c2) AS mc2, avg(c3) AS mc3, count(*) AS mc4 FROM t WHERE c1 > 90;
Query transformation example.
Original query:
SELECT count(*), sum(c1), count(c2), avg(c3), abs(count(*) - 21) FROM t WHERE c1 > 90;
AQUMV will automatically rewrite it as:
SELECT mc4, mc1, mc2, mc3, abs((mc4 - 21)) FROM mv;
Seq Scan on mv
in the query plan shows that the optimizer directly uses the materialized view:EXPLAIN (VERBOSE, COSTS OFF) SELECT count(*), sum(c1), count(c2), avg(c3), abs(count(*) - 21) FROM t WHERE c1 > 90; QUERY PLAN ------------------------------------------------------------------------ Gather Motion 3:1 (slice1; segments: 3) Output: mc4, mc1, mc2, mc3, (abs((mc4 - 21))) -> Seq Scan on mv Output: mc4, mc1, mc2, mc3, abs((mc4 - 21)) Settings: enable_answer_query_using_materialized_views = 'on', optimizer = 'off' Optimizer: Postgres query optimizer (6 rows)
HAVING clause processing
AQUMV supports queries with HAVING
clauses. If the conditions in the HAVING
clause can be calculated from the target list of the materialized view, these conditions will be converted to WHERE
clauses.
Example:
Create a table and materialized view.
CREATE TABLE t(c1 int, c2 int, c3 int, c4 int); INSERT INTO t SELECT i, i+1, i+2, i+3 FROM generate_series(1, 1000000) i; ANALYZE t; CREATE MATERIALIZED VIEW mv AS SELECT sum(c1) AS mc1, count(c2) AS mc2, avg(c3) AS mc3, count(*) AS mc4 FROM t WHERE c1 > 90;
Query transformation with
HAVING
clause.Original query:
SELECT count(*), sum(c1) FROM t WHERE c1 > 90 HAVING abs(count(*) - 21) > 0 AND 2 > 1 AND avg(c3) > 97;
AQUMV will automatically rewrite it as:
SELECT mc4, mc1 FROM mv WHERE mc3 > 97 AND abs(mc4 - 21) > 0;
The query plan shows that
HAVING
conditions are converted to filter conditions in theWHERE
clause:EXPLAIN (VERBOSE, COSTS OFF) SELECT count(*), sum(c1) FROM t WHERE c1 > 90 HAVING abs(count(*) - 21) > 0 AND 2 > 1 AND avg(c3) > 97; QUERY PLAN ------------------------------------------------------------------------ Gather Motion 3:1 (slice1; segments: 3) Output: mc4, mc1 -> Seq Scan on aqumv.mv Output: mc4, mc1 Filter: ((mv.mc3 > '97'::numeric) AND (abs((mv.mc4 - 21)) > 0)) Optimizer: Postgres query optimizer (7 rows)
Note:
The condition
2 > 1
in theHAVING
clause is optimized out during planning.abs(count(*) - 21) > 0
is converted toabs(mc4 - 21) > 0
and used as a filter condition.
LIMIT and ORDER BY Support
Because only aggregate queries without GROUP BY
are currently supported, aggregate results will have at most one row, therefore:
ORDER BY
clauses do not affect the results in this case.LIMIT
andOFFSET
clauses with constant values can be directly applied to materialized view queries.Example:
-- Original query SELECT count(*), sum(c1) FROM t WHERE c1 > 90 LIMIT 2; -- Rewritten as SELECT mc4, mc1 FROM mv LIMIT 2;
Notes for aggregate query support
Currently, only aggregate queries without
GROUP BY
clauses are supported.Conditions in
HAVING
clauses must be calculable from the target list of the materialized view.For aggregate queries, the result will be either one row or zero rows.