Update Statistics
Accurate statistics are essential for good query performance. By running the ANALYZE
command, you update table statistics, enabling the query optimizer to generate optimal execution plans. When SynxDB analyzes a table, it stores relevant statistics in system catalog tables. If these statistics become outdated, the optimizer might produce inefficient plans.
Check whether statistics are up to date
To check whether a table’s statistics are current, use the pg_stat_all_tables
system view. The last_analyze
column shows the last time the table was manually analyzed, while last_autoanalyze
shows the last automatic analyze time. Running an ANALYZE
command updates both timestamps.
For example, to check the statistics status of the test_analyze
table, run:
SELECT schemaname, relname, last_analyze, last_autoanalyze
FROM pg_stat_all_tables
WHERE relname = 'test_analyze';
Generate statistics selectively
Running ANALYZE
without parameters updates statistics for all tables in the database, which can be very time-consuming and is generally not recommended. It’s better to selectively analyze tables that have changed or to use the analyzedb
utility.
For large tables, ANALYZE
can take a long time. If you cannot analyze all columns, you can generate statistics for specific columns using ANALYZE table(column, ...)
. Be sure to include columns used in joins, WHERE
clauses, SORT
, GROUP BY
, or HAVING
clauses.
For partitioned tables, it is sufficient to analyze only the partitions that have changed—for example, after adding a new partition. Note that ANALYZE
can be run on either the root partitioned table or on individual leaf partitions (which store actual data and statistics). In SynxDB, analyzing a single partition also updates the root table’s statistics, meaning it might influence the optimizer’s plan for the entire partitioned table. You can use the pg_partition_tree()
function to list the names of all leaf partitions:
SELECT * FROM pg_partition_tree('parent_table');
Improve statistics quality
There is a trade-off between the time spent generating statistics and the quality or accuracy of those statistics. You need to find a balance that fits your workload.
To analyze large tables within a reasonable time, ANALYZE
performs random sampling instead of scanning every row. You can increase the number of sample values collected for all columns in a table by adjusting the default_statistics_target
configuration parameter. The valid range for this parameter is 1
to 10000
, and the default is 100
.
By default, default_statistics_target
applies to all columns and determines how many values are stored in the most-common-values list. A higher value might improve the optimizer’s estimation accuracy, especially for columns with skewed data distributions.
You can set default_statistics_target
at the session level using the SET
command. To apply it globally, set it in the postgresql.conf
file and reload the configuration.
When to run ANALYZE
It is recommended to run ANALYZE
in the following situations:
After loading data
After executing a
CREATE INDEX
commandAfter
INSERT
,UPDATE
, orDELETE
operations that significantly change the data
ANALYZE
only requires a read lock on the table, so it can run in parallel with other database operations. However, for performance reasons, it is not recommended to run ANALYZE
at the same time as INSERT
, UPDATE
, DELETE
, CREATE INDEX
, or data loading operations.
Note
SynxDB improves the behavior of ANALYZE
on partitioned tables. When you explicitly run statistics collection on a leaf partition (for example, ANALYZE sales_1_prt_p2023
), the system no longer updates statistics for the root or other partitions. Only when ANALYZE
is run on the root table (for example, ANALYZE sales
), will statistics for the entire table, including all child partitions, be refreshed.
This change gives you finer control over statistics maintenance and avoids unnecessary updates. In practice, it is recommended to selectively analyze specific partitions or the entire table based on data change patterns.
Configure automatic statistics collection
The configuration parameters gp_autostats_mode
and gp_autostats_on_change_threshold
determine when automatic statistics collection is triggered. When triggered, the optimizer will include an ANALYZE
step during query execution.
By default, gp_autostats_mode
is set to none
. If you set it to on_no_stats
, statistics will be automatically collected when the table owner performs CREATE TABLE AS SELECT
, INSERT
, or COPY
operations on a table that currently has no statistics.
If gp_autostats_mode
is set to on_change
, statistics will only be collected when the number of affected rows exceeds the threshold defined by gp_autostats_on_change_threshold
. The default threshold is 2147483647
. In this mode, when the table owner performs CREATE TABLE AS SELECT
, UPDATE
, DELETE
, INSERT
, or COPY
, and the affected row count exceeds the threshold, automatic statistics collection will be triggered.
In addition, if the server parameter gp_autostats_allow_nonowner
is set to true
, SynxDB will collect statistics even when a non-owner user is the first to perform an INSERT
or COPY
operation, provided that:
gp_autostats_mode
is set toon_no_stats
.
Setting gp_autostats_mode
to none
disables automatic statistics collection entirely.
For partitioned tables, inserting data into the top-level parent table does not trigger automatic statistics collection. However, inserting directly into a leaf partition (which physically stores the data) does trigger automatic statistics collection.