Optimize a PARTITION - SELECT query up to 60x faster
This post demonstrates my experience of optimizing a PARTITION - SELECT query, and how I made it up to 60x faster.
Original Query and the use case
Our App is a simple excel data version control system,
the data is organized by project,
key and data is stored in seperated table called dbKey
and dbData
.
|
|
and there’s also a sheet_version table that stores the version
, timestamp
information.
|
|
Every time we need to get specific version of data (let’s say: version 2
), we access sheet_version
table first,
and get the sheet_version.timestamp
to construct the PARTITION - SELECT
query.
To get the actual data, we need to do these steps:
- Partition the data table
dbData
bykey_id
, - Rank it by
timestamp (DESC)
, get therank=1
datas fromdbData
- Join
dbKey
anddbData
back togetter.
Here’s the query:
|
|
Here’s the db<>fiddle you can play with this query.
2
has 10
keys, each key has 50
data,
and if we change data under only 1
key, we only have to re-insert all data under this modified key.
and only need to insert 50
data.
Of course, this design has some limitations, but in this post, let’s focus on the PARTITION - SELECT
query optimization.Identifying the root cause
|
|
Useless index and time-consuming Sequential scan
This query is slow because it has to:
- Scan the whole
dbData
table - partition it by
key_id
, and rank the timestamp. - Join it with
dbKey
table withrank=1
andfinalDBData.key_id = dbKey.id
Planner tends to range over every row in data table to get rank=1
data because the rank=1
key_id - timestamp
can be anywhere
in the whole table.
This query it’s so slow, we current have about 30000
keys in key table,
each project has about 2000
keys, and almost 100
milion
data rows in data table,
it usually takes at least 60
second to get the particular version of data.
Here’s the plan of this query:
|
|
And you can also view it on explain.dalibo.com
Approach 1: Materialized View
We can use materialized view to cache the result set of particalar version of data, but the first one who needs to get data still suffers from the slow query.
Improvement: Index-Only Scan
But this query still can be better. There’s a new feature introduced in PostgreSQL 9.2, which allow us to get data from index itself, without touching the actual table data.
The documentation stats that There are two fundamental restrictions on when this method can be used:
- The index type must support index-only scans. B-tree indexes always do. GiST and SP-GiST indexes support index-only scans for some operator classes but not others. Other index types have no support. The underlying requirement is that the index must physically store, or else be able to reconstruct, the original data value for each index entry. As a counterexample, GIN indexes cannot support index-only scans because each index entry typically holds only part of the original data value.
- The query must reference only columns stored in the index. For example, given an index on columns x and y of a table that also has a column z, these queries could use index-only scans:
- If these two fundamental requirements are met, then all the data values required by the query are available from the index, so an index-only scan is physically possible. But there is an additional requirement for any table scan in PostgreSQL: it must verify that each retrieved row be “visible” to the query’s MVCC snapshot, as discussed in Chapter 13. Visibility information is not stored in index entries, only in heap entries; so at first glance it would seem that every row retrieval would require a heap access anyway. And this is indeed the case, if the table row has been modified recently. However, for seldom-changing data there is a way around this problem. PostgreSQL tracks, for each page in a table’s heap, whether all rows stored in that page are old enough to be visible to all current and future transactions. This information is stored in a bit in the table’s visibility map. An index-only scan, after finding a candidate index entry, checks the visibility map bit for the corresponding heap page. If it’s set, the row is known visible and so the data can be returned with no further work. If it’s not set, the heap entry must be visited to find out whether it’s visible, so no performance advantage is gained over a standard index scan. Even in the successful case, this approach trades visibility map accesses for heap accesses; but since the visibility map is four orders of magnitude smaller than the heap it describes, far less physical I/O is needed to access it. In most situations the visibility map remains cached in memory all the time.
Let’s verify if all these restrictions is satisfied:
- It’s staicfied because we are using
B-tree
index. - It’s satisfied by modifying our SQL query,
- It means all data in that page must be visible in visibility map, and it’s also satisfied because the data is append-only.
We can build the map between key_id and the rank=1
timestamp first,
|
|
Result will be like:
|
|
And then, get actual data from dbData
with specific key_id
and timestamp
pair.
|
|
The reason we build the map
first is that the SELECT list in map
are all stored in the index,
which satisfied requirement 2
in the documentation,
and later when we query dbData
, we can still have Index Scan.
Here’s the example
key_id
in map
should be unique, or there will be duplicated keys with same timestamp, so I added DISTINCT(key_id)
to the map
query.Final choice: I want them all!
We decided to use this optimized query to build the materialized view,
and maintain a materialized view (we call it mat_view
for short) management system to organize the creation, deletion of these mat_views.