Update Statistics
Accurate statistics are crucial for good query performance. Using the ANALYZE
statement to update statistics allows the query optimizer to generate optimal query plans. When SynxDB Elastic analyzes a table, the relevant data information is stored in the system catalog tables. If this stored information becomes outdated, the query optimizer may generate inefficient query plans.
Check whether statistics are updated
To check whether a table’s statistics are up-to-date, you can use the pg_stat_all_tables
system view. The last_analyze
column in this view shows the last time the table was manually analyzed, while the last_autoanalyze
column shows the last time it was auto-analyzed. The timestamps in both columns are updated when an ANALYZE
statement is executed.
For example, to check whether the statistics for the test_analyze
table are updated, you can execute the following query:
SELECT schemaname, relname, last_analyze, last_autoanalyze
FROM pg_stat_all_tables
WHERE relname = 'test_analyze';
Selectively generate statistics
Executing ANALYZE
without any arguments updates the statistics for all tables in the database. This is a very time-consuming process and is not recommended. It is advisable to selectively run ANALYZE
on tables when data has changed, or to use the analyzedb
utility.
Running ANALYZE
on a large table can take a long time. If you cannot run ANALYZE
on all columns of a large table, you can generate statistics for specific columns only using ANALYZE table(column, ...)
. Make sure to include columns that are used in joins, WHERE
clauses, SORT
clauses, GROUP BY
clauses, or HAVING
clauses.
For partitioned tables, you can run ANALYZE
only on the partitions that have changed, for example, when a new partition is added. Note that for a partitioned table, you can run ANALYZE
on the root partition table or on a leaf partition (the file that actually stores the data and statistics). In SynxDB Elastic, running ANALYZE
on a single partition of a partitioned table also updates the statistics of the root table, which means collecting statistics on one partition can affect the optimizer statistics for the entire partitioned table. You can use the pg_partition_tree()
function to find the names of the leaf partitions.
SELECT * FROM pg_partition_tree( 'parent_table' );
Improve statistics quality
There is a trade-off between the time it takes to generate statistics and the quality or accuracy of those statistics. You need to find a balance.
To analyze large tables in a reasonable amount of time, ANALYZE
takes a random sample of the table contents rather than checking every row. To increase the number of sample values for all columns on a table, you can adjust the default_statistics_target
configuration parameter. The target value for this parameter ranges from 1
to 10000
, with a default value of 100
.
By default, the default_statistics_target
parameter applies to all columns and specifies how many values are stored in the list of most common values. A larger target value can improve the quality of the query planner’s estimates, especially for columns with irregular data patterns.
You can set default_statistics_target
at the session level using the SET default_statistics_target
statement. To set the default value for this parameter, you need to set it in the postgresql.conf
configuration file and then reload the file.
When to run ANALYZE
Run ANALYZE
in the following situations:
After loading data
After performing
INSERT
,UPDATE
, andDELETE
operations that significantly change the underlying data
ANALYZE
only requires a read lock on the table, so it can be run in parallel with other database operations. However, for performance reasons, it is not recommended to run ANALYZE
at the same time as loading, INSERT
, UPDATE
, and DELETE
operations.
Note
SynxDB Elastic has optimized the behavior of running ANALYZE
on partitioned tables. When statistics are explicitly collected on a leaf partition (e.g., ANALYZE sales_1_prt_p2023
), the system no longer automatically updates the statistics of the root table or other partitions. Only when ANALYZE
is explicitly run on the root table (e.g., ANALYZE sales
) will the statistics for the entire table, including all sub-partitions, be updated.
This behavioral improvement provides greater control over statistics management and avoids unnecessary statistical refreshes. In practice, it is recommended to selectively analyze specific partitions or the entire table based on data changes.
Configure automatic statistics collection
The gp_autostats_mode
configuration parameter, in conjunction with gp_autostats_on_change_threshold
, determines when to trigger an automatic analysis operation. When automatic statistics collection is triggered, the optimizer adds an ANALYZE
step to the query.
By default, the value of gp_autostats_mode
is none
. If this parameter is set to on_no_stats
, statistics collection will be triggered for CREATE TABLE AS SELECT
, INSERT
, or COPY
operations on tables that have no existing statistics, when performed by the table owner.
When gp_autostats_mode
is set to on_change
, statistics are collected only when the number of affected rows exceeds the threshold set by gp_autostats_on_change_threshold
. The default value for this threshold is 2147483647
. Automatic statistics collection is triggered when the number of affected rows in CREATE TABLE AS SELECT
, UPDATE
, DELETE
, INSERT
, and COPY
operations, performed by the table owner, exceeds this threshold.
In addition, if the gp_autostats_allow_nonowner
server configuration parameter is set to true
, SynxDB Elastic will perform automatic statistics collection on a table in the following case:
When
gp_autostats_mode
is set toon_no_stats
, and a non-owner user is the first to perform anINSERT
orCOPY
operation on the table.
Setting gp_autostats_mode
to none
disables automatic statistics collection.
For partitioned tables, inserting data into the top-level parent table does not trigger automatic statistics collection. However, if data is inserted directly into a leaf table of the partitioned table (where the data is actually stored), automatic statistics collection will be triggered.