Automatically Use Materialized Views for Query Optimization
SynxDB Elastic supports automatically using materialized views during the query planning phase to compute part or all of a query (a feature known as AQUMV). This is particularly useful for queries on large tables, significantly improving query processing time. AQUMV utilizes incremental materialized views (IMVs) because they generally stay up-to-date when the underlying tables have write operations.
Scenarios
Large-scale aggregate queries: For queries that need to aggregate millions of records, AQUMV can significantly reduce query time.
Frequently updated large tables: In environments with frequent data updates, using IMVs ensures the timeliness and accuracy of query results.
Complex calculation scenarios: For queries involving complex calculations (like the square root and absolute value calculations in the example below), AQUMV can speed up queries by pre-calculating these values in a materialized view.
High-frequency data write scenarios: When the base table has very frequent data writes, synchronously updating the materialized view might greatly impact write performance. In this case, you can use asynchronous incremental materialized views to run refresh tasks in the background, reducing the impact on foreground business operations.
Aggregate queries on large partitioned tables: Partitioned tables are often used to manage large amounts of data. Building materialized views on large partitioned tables allows complex aggregate queries or join operations across multiple partitions to be pre-computed. The AQUMV feature can automatically leverage these materialized views, improving query analysis performance on large partitioned tables and avoiding the significant overhead of full table scans.
Example
To enable the AQUMV feature, you first need to create a materialized view and set the system parameter enable_answer_query_using_materialized_views
to ON
. The following is a comparison of executing the same complex query with and without AQUMV.
Disable the GPORCA optimizer and use the Postgres-based planner.
SET optimizer TO off;
Create the table
aqumv_t1
.CREATE TABLE aqumv_t1(c1 INT, c2 INT, c3 INT);
Insert data into the table and collect statistics on it.
INSERT INTO aqumv_t1 SELECT i, i+1, i+2 FROM generate_series(1, 100000000) i; ANALYZE aqumv_t1;
Execute the query without AQUMV enabled. It takes 7384.329 ms.
SELECT SQRT(ABS(ABS(c2) - c1 - 1) + ABS(c2)) FROM aqumv_t1 WHERE c1 > 30 AND c1 < 40 AND SQRT(ABS(c2)) > 5.8; sqrt -------------------66.0827625302982196.2449979983983985.9160797830996166.1644140029689766.3245553203367595.830951894845301 (7 rows) Time: 7384.329 ms (00:07.384)
The query plan is as follows, showing that the optimizer performs a scan on the table (
Seq Scan on aqumv_t1
).EXPLAIN(COSTS OFF) SELECT SQRT(ABS(ABS(c2) - c1 - 1) + ABS(c2)) FROM aqumv_t1 WHERE c1 > 30 AND c1 < 40 AND SQRT(ABS(c2)) > 5.8; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) -> Seq Scan on aqumv_t1 Filter: ((c1 > 30) AND (c1 < 40) AND (sqrt((abs(c2))::double precision) > '5.8'::double pre cision)) Optimizer: Postgres query optimizer (4 rows)
Create a materialized view
mvt1
based onaqumv_t1
and collect statistics on the view.CREATE INCREMENTAL MATERIALIZED VIEW mvt1 AS SELECT c1 AS mc1, c2 AS mc2, ABS(c2) AS mc3, ABS(ABS(c2) - c1 - 1) AS mc4 FROM aqumv_t1 WHERE c1 > 30 AND c1 < 40; ANALYZE mvt1;
In the example above, the materialized view
mvt1
is updated synchronously. If data is inserted into theaqumv_t1
table very frequently, you can change it to asynchronous refresh mode to improve write performance.Just add the
REFRESH DEFERRED
andSCHEDULE
clauses during creation:-- Changes the materialized view to asynchronous refresh with a 10-second interval. CREATE INCREMENTAL MATERIALIZED VIEW mvt1 REFRESH DEFERRED SCHEDULE '10 seconds' AS SELECT c1 AS mc1, c2 AS mc2, ABS(c2) AS mc3, ABS(ABS(c2) - c1 - 1) AS mc4 FROM aqumv_t1 WHERE c1 > 30 AND c1 < 40;
This way, the data in the materialized view will be periodically updated by a background task, and the query acceleration feature (AQUMV) will still automatically use this up-to-date materialized view.
Enable the AQUMV-related configuration parameter:
SET enable_answer_query_using_materialized_views = ON;
Now that AQUMV is enabled, execute the same query again. It takes 45.701 ms.
SELECT SQRT(ABS(ABS(c2) - c1 - 1) + ABS(c2)) FROM aqumv_t1 WHERE c1 > 30 AND c1 < 40 AND SQRT(ABS(c2)) > 5.8; sqrt -------------------66.0827625302982196.2449979983983985.8309518948453015.9160797830996166.1644140029689766.324555320336759 (7 rows) Time: 45.701 ms
The query plan is as follows, showing that the optimizer scans the materialized view
mvt1
(Seq Scan on public.mvt1
) instead of the tableaqumv_t1
.explain(verbose, costs off)select sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) from aqumv_t1 where c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8; QUERY PLAN -------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) Output: (sqrt(((mc4 + mc3))::double precision)) -> Seq Scan on public.mvt1 Output: sqrt(((mc4 + mc3))::double precision) Filter: (sqrt((mvt1.mc3)::double precision) > '5.8'::double precision) Settings: enable_answer_query_using_materialized_views = 'on', optimizer = 'off' Optimizer: Postgres query optimizer (7 rows)
In the example above, the query took 7384.329 ms without using a materialized view. After enabling AQUMV, the same query using the materialized view took only 45.701 ms. This demonstrates that the materialized view greatly improves performance by pre-calculating and storing the relevant computation results, containing only the rows that satisfy the specific condition (
c1 > 30 and c1 < 40
).Therefore, the table query
select sqrt(abs(abs(c2) - c1 - 1) + abs(c2)) from aqumv_t1 where c1 > 30 and c1 < 40 and sqrt(abs(c2)) > 5.8;
is actually equivalent to the query on the materialized viewselect sqrt(mc4 + mc3) from mvt1 where sqrt(mc3) > 5.8;
.
This way, when executing the same query, data can be fetched directly from the materialized view instead of the original table. This allows AQUMV to greatly improve query performance, especially when dealing with large data volumes and complex calculations.
How it works
AQUMV achieves query optimization by performing an equivalent transformation of the query tree.
SynxDB Elastic automatically uses a materialized view for a table query only if the following conditions are met:
The materialized view must contain all the rows required by the query expression.
If the materialized view contains more rows than the query, additional filtering conditions might need to be added.
All output expressions must be computable from the view’s output.
The output expressions can fully or partially match the target list of the materialized view.
When there are multiple valid materialized view candidates, or when the cost of querying from the materialized view is higher than querying directly from the original table, the planner can decide the best choice based on cost estimation.
Limitations
Only
SELECT
queries on a single relation are supported, applicable to both the materialized view query and the original query.Currently, the following features are not supported: aggregations (AGG), subqueries, sorting in the original query (ORDER BY), joins (JOIN), sublinks (SUBLINK), grouping (GROUP BY), window functions, Common Table Expressions (CTE), deduplication (DISTINCT ON),
REFRESH MATERIALIZED VIEW
, andCREATE AS
statements.Incremental materialized views are currently optimized primarily for data append (
INSERT
) scenarios. IfDELETE
,UPDATE
, orTRUNCATE
operations occur on its base table, the materialized view will perform a costly full refresh instead of an incremental update.