Skip to main content

API Design: Use type state pattern to avoid ambiguous option flags

· One min read

For example, ZADD is a command that add member with score to sorted set, and it can accept NX or XX as option.

ZADD key [NX | XX] [GT | LT] [CH] [INCR] score member [score member
...]
  • XX: Only update elements that already exist. Don't add new elements.
  • NX: Only add new elements. Don't update already existing elements.

NX and XX can only choose one. In go-redis, this structure is used to hold arguments, but this requires extra comments and checks to tell users that NX and XX are mutually exclusive.

type ZAddArgs struct {
NX bool
XX bool
LT bool
GT bool
Ch bool
Members []Z
}

Ref: go-redis ZAddArgs

But in redis/rueidis, it provides a command builder where the type system directly prevents you from setting both NX and XX at the same time.

client.B().Zadd().Key("1").Nx().Gt().Ch().Incr().ScoreMember().ScoreMember(1, "1").ScoreMember(1, "1").Build()

Build Nested JSON in PostgreSQL

· 2 min read

Original Stackoverflow thread:

https://stackoverflow.com/questions/42222968/create-nested-json-from-sql-query-postgres-9-4/42226253#42226253

Suppose we have this tables:

person car wheel And the relation between is:

person:car = 1:N car:wheel = 1:N We need to build some nested JSON Object with SQL Query to get the summary about details of each car this person has, what would you do ?

The Goal

{
"persons": [
{
"person_name": "Johny",
"cars": [
{
"carid": 1,
"type": "Toyota",
"comment": "nice car",
"wheels": [
{
"which": "front",
"serial number": 11
},
{
"which": "back",
"serial number": 12
}
]
},
{
"carid": 2,
"type": "Fiat",
"comment": "nice car",
"wheels": [
{
"which": "front",
"serial number": 21
},
{
"which": "back",
"serial number": 22
}
]
}
]
},
{
"person_name": "Freddy",
"cars": [
{
"carid": 3,
"type": "Opel",
"comment": "nice car",
"wheels": [
{
"which": "front",
"serial number": 3
}
]
}
]
}
]
}

Approach 1 - Left Join

select
json_build_object(
'persons', json_agg(
json_build_object(
'person_name', p.name,
'cars', cars
)
)
) persons
from person p
left join (
select
personid,
json_agg(
json_build_object(
'carid', c.id,
'type', c.type,
'comment', 'nice car', -- this is constant
'wheels', wheels
)
) cars
from
car c
left join (
select
carid,
json_agg(
json_build_object(
'which', w.whichone,
'serial number', w.serialnumber
)
) wheels
from wheel w
group by 1
) w on c.id = w.carid
group by personid
) c on p.id = c.personid;

Approach 2 - Put sub-query in SELECT-List with json_build_object and json_agg

This is the SQL query based on Nico Van Belle's answer, but I replaced row_to_json with json_buid_object.

select json_build_object(
'persons', (
SELECT json_agg(
json_build_object(
'person_id',id,
'cars', (
SELECT json_agg(
json_build_object(
'car_id', car.id,
'wheels', (
SELECT json_agg(
json_build_object(
'wheel_id', wheel.id,
'whichone', wheel.whichone,
'serialnumber', wheel.serialnumber,
'car_id', wheel.carid
)
)
FROM wheel WHERE wheel.carid = car.id
)
)
) FROM car WHERE id = person.id
)
)
) FROM person
)
);

You can view the result ojnline with db<>fiddle

Why Cost is so high ?

  • Each Sub-node has to be executed N times, where N is number of person

Query Plan

Summary

I think putting sub-query in SELECT-List is elegant, but it's costly.

https://medium.com/@e850506/note-more-nested-json-5f3c1e4a87e

Use sync.Pool to reduce memory consumption

· 5 min read

Our service is like a excel document datastore. and we use xorm as ORM framework, Everytime we need to get data from DB, we call session.Find(&[]Author{}) with the slice of table beans, but this have a problem,

  • Memory allocation is very high

So every time lots of clients try to download excel file, the memory consumption is too high, and downloadling excel file takes too long to complete.

Find the root cause with pprof

I wrote a benchmark and by leveraging GO's pprof profiling tool, we can easily check out the flamegraph using some tool like pyroscope.

Here's the result we got:

CPU

Structure-Binding-cpu

Memory Allocation

Structure-Binding-mem

We can see that under the frame of (*Session).rows2Beans, except the function underneath xorm framework that we can't touch, (*Session).slice2Bean took a lot of CPU time and had lot of memory allocation.

The problem of Structure Binding

After took a look at the code in noCacheFind, I found that if we use bean (a structure with information about db schema definition) to hold the result set, xorm will call session.rows2Beans to convert rows into tableBean.

In sesson.rows2Beans(), it will:

  • convert rows to slices ([]any) by calling session.row2Slice()
  • convert []any to []bean by calling session.slice2Bean()

And this tooks a lot of time.

But I also found that if we use [][]string to hold the result set, after getting xorm.Rows (underlying data structure is database/sql.Rows), noCacheFind() will call rows.Scan for each row, so simple ! This is the chance we can make session.Find() much faster.

Step 1: Use [][]string to hold the data

Based on the assumption, we can use [][]string to reduce the cost of structure binding, you can see the benchmark below unifyContainerNoPool .

Step 2: Use sync.Pool to reduce memory allocation

But it still need huge amount of memory allocation for every []string and every [][]string, let's see how we can reduce this cost.

The solution I came out is very simple, if memory allocation is time-consuming, why don't we reuse the data structure in memory ? In this case, we're using [][]string

var unifyContainerRowPool = sync.Pool{
New: func() interface{} {
conRow := make([]string, DefaultContainerColLen)
conRow = resetUnifyContainerRow(conRow)
return conRow[:0]
},
}

var unifyContainerPool = sync.Pool{
New: func() interface{} {
// fmt.Println("New is called for unifyContainerPool")
con := make([][]string, 0, DefaultContainerRowLen)
return con
},
}

Experiment:

To demonstrate the improvement of our code, I design a simple benchmark,

There are three ways we can get data from database.

  • Use []Author to hold the data (Structure Binding)
  • Use [][]string to hold the data (Unify Container without sync.Pool)
  • Use [][]string to hold the data, and use sync.Pool to reuse [][]string (Unify Container with sync.Pool)

For row number between 1000 and 8000 to demonstrate the benefit of sync.Pool, we use runtime.NumCPU() worker to perform runtime.NumCPU()*4 jobs, every job gets all rows from the author table

$ make BENCHTIME=1s
go test -benchmem -benchtime=1s \
-bench=. \
| tee data/result_all.txt
goos: darwin
goarch: arm64
pkg: github.com/unknowntpo/playground-2022/go/xorm/unifyContainer
BenchmarkContainer/StructureBinding-1000-8 13 78949647 ns/op 91926655 B/op 3146081 allocs/op
BenchmarkContainer/UnifyContainerWithPool-1000-8 31 39028380 ns/op 31799634 B/op 1882362 allocs/op
BenchmarkContainer/UnifyContainerNoPool-1000-8 22 48651809 ns/op 48547759 B/op 2407600 allocs/op
BenchmarkContainer/StructureBinding-2000-8 8 137213729 ns/op 189730109 B/op 6284178 allocs/op
BenchmarkContainer/UnifyContainerWithPool-2000-8 15 72343683 ns/op 63592857 B/op 3759864 allocs/op
BenchmarkContainer/UnifyContainerNoPool-2000-8 12 87559920 ns/op 97780912 B/op 4807668 allocs/op
BenchmarkContainer/StructureBinding-3000-8 6 199308167 ns/op 281507561 B/op 9422225 allocs/op
BenchmarkContainer/UnifyContainerWithPool-3000-8 10 105695333 ns/op 97377107 B/op 5654077 allocs/op
BenchmarkContainer/UnifyContainerNoPool-3000-8 8 128159927 ns/op 146226483 B/op 7207695 allocs/op
BenchmarkContainer/StructureBinding-4000-8 4 256713490 ns/op 379839898 B/op12560279 allocs/op
BenchmarkContainer/UnifyContainerWithPool-4000-8 8 140550521 ns/op 129773817 B/op 7537186 allocs/op
BenchmarkContainer/UnifyContainerNoPool-4000-8 7 165150417 ns/op 195457696 B/op 9607724 allocs/op
BenchmarkContainer/StructureBinding-5000-8 4 323341906 ns/op 486299350 B/op15698332 allocs/op
BenchmarkContainer/UnifyContainerWithPool-5000-8 7 162782482 ns/op 163561488 B/op 9417513 allocs/op
BenchmarkContainer/UnifyContainerNoPool-5000-8 5 200822450 ns/op 245477224 B/op12007762 allocs/op
BenchmarkContainer/StructureBinding-6000-8 6 195379785 ns/op 195452422 B/op11307507 allocs/op
BenchmarkContainer/UnifyContainerWithPool-6000-8 6 195379785 ns/op 195452422 B/op11307507 allocs/op
BenchmarkContainer/UnifyContainerNoPool-6000-8 4 258140198 ns/op 296804806 B/op14407787 allocs/op
BenchmarkContainer/StructureBinding-7000-8 3 512568570 ns/op 720955394 B/op21974306 allocs/op
BenchmarkContainer/UnifyContainerWithPool-7000-8 4 251422083 ns/op 224965602 B/op13170581 allocs/op
BenchmarkContainer/UnifyContainerNoPool-7000-8 4 288070792 ns/op 349445756 B/op16807820 allocs/op
BenchmarkContainer/StructureBinding-8000-8 2 531542583 ns/op 792064800 B/op25112484 allocs/op
BenchmarkContainer/UnifyContainerWithPool-8000-8 4 271685614 ns/op 260817526 B/op15089126 allocs/op
BenchmarkContainer/UnifyContainerNoPool-8000-8 4 338913490 ns/op 395270596 B/op19207827 allocs/op
PASS
ok github.com/unknowntpo/playground-2022/go/xorm/unifyContainer 46.676s

The result shows that the number of allocation per operation is quite different,

The Structure Binding Method needs the largest number of allocations, and the speed is way slower that other two methods. When row number goes high, performance get worse very quickly.

The Method of using [][]string with sync.Pool on the other hand, needs smallest number of memory allocation, and compare to the one without sync.Pool, and because memory allocation takes significant amount of time, it's still faster.

Here's the plot:

perf

I put my code at the repo, please go check it out!

Optimize a PARTITION - SELECT query up to 60x faster

· 9 min read

This post demonstrates my experience of optimizing a PARTITION - SELECT query, and how I made it up to 60x faster.

Original Query and the use case

Our App is a simple excel data version control system, the data is organized by project, key and data is stored in seperated table called dbKey and dbData.

create table dbKey (
id serial ,
project_id int,
-- keys goes here
-- NOTE: key can be 1...N fields, and we use string.Join(fields, sep)
-- to handle it has the key string in backend service
name text
);
create table dbData (
id serial ,
key_id int ,
timestamp int

--- data stores at here
);

and there's also a sheet_version table that stores the version, timestamp information.

create table sheet_version (
id serial ,
version integer,
timestamp int
);

Every time we need to get specific version of data (let's say: version 2), we access sheet_version table first, and get the sheet_version.timestamp to construct the PARTITION - SELECT query.

To get the actual data, we need to do these steps:

  1. Partition the data table dbData by key_id,
  2. Rank it by timestamp (DESC), get the rank=1 datas from dbData
  3. Join dbKey and dbData back togetter.

Here's the query:

SELECT
dbKey.*, finalDBData.*
FROM
dbKey,
(
SELECT
*,
rank() OVER (PARTITION BY key_id ORDER BY TIMESTAMP DESC) AS rank
FROM
dbData where "timestamp" <= 101) finalDBData
where
dbKey.project_id = 10
and rank =1
and finalDBData.key_id = dbKey.id;

Here's the db<>fiddle you can play with this query.

info

We choose this design because it can save a lot of space to store every version of data. If version 2 has 10 keys, each key has 50 data, and if we change data under only 1 key, we only have to re-insert all data under this modified key. and only need to insert 50 data. Of course, this design has some limitations, but in this post, let's focus on the PARTITION - SELECT query optimization.

Identifying the root cause

SELECT
dbKey.*, finalDBData.*
FROM
dbKey,
(
SELECT
*,
rank() OVER (PARTITION BY key_id ORDER BY TIMESTAMP DESC) AS rank
FROM
dbData where "timestamp" <= 101) finalDBData
where rank =1
and finalDBData.key_id = dbKey.id;

Useless index and time-consuming Sequential scan

This query is slow because it has to:

  1. Scan the whole dbData table
  2. partition it by key_id, and rank the timestamp.
  3. Join it with dbKey table with rank=1 and finalDBData.key_id = dbKey.id

Planner tends to range over every row in data table to get rank=1 data because the rank=1 key_id - timestamp can be anywhere in the whole table.

This query it's so slow, we current have about 30000 keys in key table, each project has about 2000 keys, and almost 100 milion data rows in data table, it usually takes at least 60 second to get the particular version of data.

Here's the plan of this query:

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1125351.65..1289874.58 rows=5621 width=57) (actual time=9082.308..9468.256 rows=11020 loops=1)
Output: dbkey.id, dbkey.name, finaldbdata.id, finaldbdata.key_id, finaldbdata."timestamp", finaldbdata.rank
Hash Cond: (finaldbdata.key_id = dbkey.id)
Buffers: shared hit=358 read=545756, temp read=3000 written=3018
-> Subquery Scan on finaldbdata (cost=1125043.98..1289482.62 rows=5614 width=20) (actual time=9077.986..9459.255 rows=11000 loops=1)
Output: finaldbdata.id, finaldbdata.key_id, finaldbdata."timestamp", finaldbdata.rank
Filter: (finaldbdata.rank = 1)
Rows Removed by Filter: 1100200
Buffers: shared hit=274 read=545756, temp read=3000 written=3018
-> WindowAgg (cost=1125043.98..1275448.81 rows=1122705 width=20) (actual time=9077.985..9432.015 rows=1111200 loops=1)
Output: dbdata.id, dbdata.key_id, dbdata."timestamp", rank() OVER (?)
Buffers: shared hit=274 read=545756, temp read=3000 written=3018
-> Gather Merge (cost=1125043.98..1255801.47 rows=1122705 width=12) (actual time=9077.972..9174.199 rows=1111200 loops=1)
Output: dbdata.key_id, dbdata."timestamp", dbdata.id
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=274 read=545756, temp read=3000 written=3018
-> Sort (cost=1124043.95..1125213.44 rows=467794 width=12) (actual time=9060.365..9078.656 rows=370400 loops=3)
Output: dbdata.key_id, dbdata."timestamp", dbdata.id
Sort Key: dbdata.key_id, dbdata."timestamp" DESC
Sort Method: external merge Disk: 8304kB
Buffers: shared hit=274 read=545756, temp read=3000 written=3018
Worker 0: actual time=9048.365..9066.503 rows=354371 loops=1
Sort Method: external merge Disk: 7656kB
Buffers: shared hit=105 read=175482, temp read=957 written=963
Worker 1: actual time=9060.662..9079.499 rows=372284 loops=1
Sort Method: external merge Disk: 8040kB
Buffers: shared hit=105 read=180922, temp read=1005 written=1011
-> Parallel Seq Scan on public.dbdata (cost=0.00..1071990.75 rows=467794 width=12) (actual time=5.360..8698.716 rows=370400 loops=3)
Output: dbdata.key_id, dbdata."timestamp", dbdata.id
Filter: (dbdata."timestamp" <= 101)
Rows Removed by Filter: 33296333
Buffers: shared hit=192 read=545756
Worker 0: actual time=4.511..8532.085 rows=354371 loops=1
Buffers: shared hit=64 read=175482
Worker 1: actual time=3.410..8640.241 rows=372284 loops=1
Buffers: shared hit=64 read=180922
-> Hash (cost=183.41..183.41 rows=9941 width=37) (actual time=4.312..4.313 rows=10010 loops=1)
Output: dbkey.id, dbkey.name
Buckets: 16384 Batches: 1 Memory Usage: 803kB
Buffers: shared hit=84
-> Seq Scan on public.dbkey (cost=0.00..183.41 rows=9941 width=37) (actual time=0.007..1.395 rows=10010 loops=1)

And you can also view it on explain.dalibo.com

Approach 1: Materialized View

We can use materialized view to cache the result set of particalar version of data, but the first one who needs to get data still suffers from the slow query.

Improvement: Index-Only Scan

But this query still can be better. There's a new feature introduced in PostgreSQL 9.2, which allow us to get data from index itself, without touching the actual table data.

The documentation stats that There are two fundamental restrictions on when this method can be used:

  1. The index type must support index-only scans. B-tree indexes always do. GiST and SP-GiST indexes support index-only scans for some operator classes but not others. Other index types have no support. The underlying requirement is that the index must physically store, or else be able to reconstruct, the original data value for each index entry. As a counterexample, GIN indexes cannot support index-only scans because each index entry typically holds only part of the original data value.
  2. The query must reference only columns stored in the index. For example, given an index on columns x and y of a table that also has a column z, these queries could use index-only scans:
  3. If these two fundamental requirements are met, then all the data values required by the query are available from the index, so an index-only scan is physically possible. But there is an additional requirement for any table scan in PostgreSQL: it must verify that each retrieved row be "visible" to the query's MVCC snapshot, as discussed in Chapter 13. Visibility information is not stored in index entries, only in heap entries; so at first glance it would seem that every row retrieval would require a heap access anyway. And this is indeed the case, if the table row has been modified recently. However, for seldom-changing data there is a way around this problem. PostgreSQL tracks, for each page in a table's heap, whether all rows stored in that page are old enough to be visible to all current and future transactions. This information is stored in a bit in the table's visibility map. An index-only scan, after finding a candidate index entry, checks the visibility map bit for the corresponding heap page. If it's set, the row is known visible and so the data can be returned with no further work. If it's not set, the heap entry must be visited to find out whether it's visible, so no performance advantage is gained over a standard index scan. Even in the successful case, this approach trades visibility map accesses for heap accesses; but since the visibility map is four orders of magnitude smaller than the heap it describes, far less physical I/O is needed to access it. In most situations the visibility map remains cached in memory all the time.

Let's verify if all these restrictions is satisfied:

info
  1. It's staicfied because we are using B-tree index.
  2. It's satisfied by modifying our SQL query,
  3. It means all data in that page must be visible in visibility map, and it's also satisfied because the data is append-only.

We can build the map between key_id and the rank=1 timestamp first,

WITH map AS (
SELECT
DISTINCT(key_id),
timestamp
FROM (
SELECT
key_id,
timestamp,
rank() OVER (PARTITION BY key_id ORDER BY TIMESTAMP DESC) AS rank
FROM
dbData
-- filtering stuff depends on business logic
where "timestamp" <= 10000 and key_id < 100
) sub WHERE rank = 1)
SELECT * FROM map;

Result will be like:

 key_id | timestamp
--------+-----------
1 | 10000
2 | 300
3 | 6000
4 | 90303

And then, get actual data from dbData with specific key_id and timestamp pair.

WITH map AS (
SELECT
DISTINCT(key_id),
timestamp
FROM (
SELECT
key_id,
timestamp,
rank() OVER (PARTITION BY key_id ORDER BY TIMESTAMP DESC) AS rank
FROM
dbData
-- filtering stuff depends on business logic
where "timestamp" <= 10000 and key_id < 100
) sub WHERE rank = 1)
SELECT
dbKey.*, dbData.*
FROM
dbKey
INNER JOIN map m ON m.key_id = dbKey.id
INNER JOIN dbData ON dbData.key_id = m.key_id AND m.timestamp = dbData.timestamp;

The reason we build the map first is that the SELECT list in map are all stored in the index, which satisfied requirement 2 in the documentation, and later when we query dbData , we can still have Index Scan.

Here's the example

note

UPDATE: The key_id in map should be unique, or there will be duplicated keys with same timestamp, so I added DISTINCT(key_id) to the map query.

Final choice: I want them all!

We decided to use this optimized query to build the materialized view, and maintain a materialized view (we call it mat_view for short) management system to organize the creation, deletion of these mat_views.

Reference:

ChatGPT First Glance

· One min read

This is my first glance of ChatGPT, and I ask her to generate a peice of code in Haskell, which can map a function to a list.

The result she generated is totally correct, and can be run in playground.

addOneToEach :: [Int] -> [Int]
addOneToEach xs = map (+1) xs

myMap :: (a -> b) -> [a] -> [b]
myMap _ [] = []
myMap f (x:xs) = f x : myMap f xs

main = do
let myList = [1, 2, 3, 4]
let doubledList = myMap (*2) myList
print doubledList
-- Output: [2,4,6,8]

Here's the link to our chat: https://sharegpt.com/c/yedzb1N

My First Post

· One min read

Inline Formula: E=iEi=E1+E2+E3+\mathbf{E}=\sum_{i} \mathbf{E}_{i}=\mathbf{E}_{1}+\mathbf{E}_{2}+\mathbf{E}_{3}+\cdots

SELECT 'hello-world' FROM me

Block Formula:

a=b+cd+e=fa=b+c \\ d+e=f a=b+cd+e=fa=b+c \\ d+e=f

Test IFrame