Date:

Share:

PostgreSQL 14’s enable_memoize For Improved Performance of Nested Loop Joins – Java, SQL and jOOQ.

Related Articles

I recently discovered a nice new addition to PostgreSQL 14, The new enable_memoize A flag that improves the performance of any nested loop joins in when the statistics suggest it is appropriate. That is, who can resist this temptation:

Improving query speed 1000 times implies something very suboptimal that has happened in the past, and a tool like memory can be of great help. But will it also help with a “regular” joining? I wanted to try myself.

What is memorization?

In a perfect world with no side effects (and SQL is such a perfect world, in theory), Memorization Says we can replace y To f(x) In any calculation, given that y = f(x). For example, no matter how many times you think UPPER('x'), You will always receive 'X'. If the calculation of such a function is expensive, and there are only a few possible input values, then why not just keep a stack map that maps all the previous input values ​​and use it to look for known (or at least frequent) values ​​instead of calculating them again?

As I showed earlier in this blog, Oracle 11g introduced a feature called Scalar sub-cache, A feature that you can enable in jOOQ to avoid expensive PL / SQL context switches.

In the case of PostgreSQL enable_memoize, It can be especially useful for nested loop connections in SQL, and to address the above tweet, lateral joining is most often performed using a nested loop connection.

Turn the feature on and off

I created such a scheme:

CREATE TABLE t AS
SELECT i, i % 5 AS j 
FROM generate_series(1, 100000) AS t(i);

CREATE TABLE u AS
SELECT i, i % 20000 as j 
FROM generate_series(1, 100000) AS t(i);

CREATE INDEX uj ON u(j);

In conclusion:

  • Both tables t and u There are 100,000 lines.
  • t.j There are only 5 separate values, each value appears 20,000 times.
  • u.j There are 20000 clear values, each value appears 5 times.

When running it on PostgreSQL 14:

SELECTcurrent_setting('enable_memoize');

I get:

|current_setting|
|---------------|
|on             |

So the feature is active, which I can also see in EXPLAIN Of the following query:

EXPLAIN
SELECT *
FROM t JOIN u ON t.j = u.j;

The program is:

|QUERY PLAN                                                            |
|----------------------------------------------------------------------|
|Nested Loop  (cost=0.30..8945.41 rows=496032 width=16)                |
|  ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=8)         |
|  ->  Memoize  (cost=0.30..0.41 rows=5 width=8)                       | 
|        Cache Key: t.j                                                |
|        ->  Index Scan using uj on u  (cost=0.29..0.40 rows=5 width=8)|
|              Index Cond: (j = t.j)                                   |

Without memorization, when joining both tables, then for 100000 rows in t, I need to search 100000 times the 5 matching rows u. But if the memories start, then I will have to perform the test only 5 times, because there are only 5 clear values ​​of t.j

We can play with execution plans by turning the feature on or off:

SET enable_memoize = ON;
SET enable_memoize = OFF;

When turned off, PostgreSQL appears to be selecting a hash or merge combination instead, on my computer (between multiple performances, the program may switch)

|QUERY PLAN                                                         |
|-------------------------------------------------------------------|
|Hash Join  (cost=3084.00..11568.51 rows=499351 width=16)           |
|  Hash Cond: (t.j = u.j)                                           |
|  ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=8)      |
|  ->  Hash  (cost=1443.00..1443.00 rows=100000 width=8)            |
|        ->  Seq Scan on u  (cost=0.00..1443.00 rows=100000 width=8)|


|QUERY PLAN                                                              |
|------------------------------------------------------------------------|
|Merge Join  (cost=9748.11..763846.11 rows=50000000 width=16)            |
|  Merge Cond: (u.j = t.j)                                               |
|  ->  Index Scan using uj on u  (cost=0.29..3848.29 rows=100000 width=8)|
|  ->  Sort  (cost=9747.82..9997.82 rows=100000 width=8)                 |
|        Sort Key: t.j                                                   |
|        ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=8)     |

Let’s become a benchmark

We use The standard comparison technique described here:

  • We repeat an operation 25 times in mode A and in mode B and compare (or more than 25, if it is a fast action)
  • We repeat the above 5 times to moderate any heating and other cache effects

You can run the following index in the above scheme yourself, to verify:

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 25;
  rec RECORD;
BEGIN

  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := clock_timestamp();
    SET enable_memoize = OFF;
  
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT t.*
        FROM t JOIN u ON t.j = u.j
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Statement 1: %', r, (clock_timestamp() - v_ts);
    v_ts := clock_timestamp();
    SET enable_memoize = ON;

    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT t.*
        FROM t JOIN u ON t.j = u.j
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Statement 2: %', r, (clock_timestamp() - v_ts);
    RAISE INFO '';
  END LOOP;
END$$;

In my machine, the results are consistent, decent, not too impressive, but still significant:

Run 1, Statement 1: 00:00:03.763426
Run 1, Statement 2: 00:00:03.401346

Run 2, Statement 1: 00:00:03.769419
Run 2, Statement 2: 00:00:03.375677

Run 3, Statement 1: 00:00:03.771465
Run 3, Statement 2: 00:00:03.374413

Run 4, Statement 1: 00:00:03.769136
Run 4, Statement 2: 00:00:03.398734

Run 5, Statement 1: 00:00:03.772544
Run 5, Statement 2: 00:00:03.375272

I.e. a speed of 10%. In the whole system, it alone will already be worth it.

Lateral optimization

Let’s try to optimize LATERAL Instead. We can run such a query:

SELECT *
FROM 
  t, 
  LATERAL (
    SELECT count(*) 
    FROM u 
    WHERE t.j = u.j
  ) AS u(j)

God EXPLAIN From the above is

|QUERY PLAN                                                                       |
|---------------------------------------------------------------------------------|
|Nested Loop  (cost=4.40..3969.47 rows=100000 width=16)                           |
|  ->  Seq Scan on t  (cost=0.00..1443.00 rows=100000 width=8)                    |
|  ->  Memoize  (cost=4.40..4.42 rows=1 width=8)                                  |
|        Cache Key: t.j                                                           |
|        ->  Aggregate  (cost=4.39..4.40 rows=1 width=8)                          |
|              ->  Index Only Scan using uj on u  (cost=0.29..4.38 rows=5 width=0)|
|                    Index Cond: (j = t.j)                                        |

So, we can again store the calculation of the COUNT(*) Value for each of the 5 differentials t.j Input values, instead of recalculating it each time. Sure, it must be even better than before?

Criterion time!

DO $$
DECLARE
  v_ts TIMESTAMP;
  v_repeat CONSTANT INT := 25;
  rec RECORD;
BEGIN

  -- Repeat the whole benchmark several times to avoid warmup penalty
  FOR r IN 1..5 LOOP
    v_ts := clock_timestamp();
    SET enable_memoize = OFF;
  
    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT *
        FROM 
          t, 
          LATERAL (
            SELECT count(*) 
            FROM u 
            WHERE t.j = u.j
          ) AS u(j)
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Statement 1: %', r, (clock_timestamp() - v_ts);
    v_ts := clock_timestamp();
    SET enable_memoize = ON;

    FOR i IN 1..v_repeat LOOP
      FOR rec IN (
        SELECT *
        FROM 
          t, 
          LATERAL (
            SELECT count(*) 
            FROM u 
            WHERE t.j = u.j
          ) AS u(j)
      ) LOOP
        NULL;
      END LOOP;
    END LOOP;

    RAISE INFO 'Run %, Statement 2: %', r, (clock_timestamp() - v_ts);
    RAISE INFO '';
  END LOOP;
END$$;

And this time, we can see a significant acceleration!

Run 1, Statement 1: 00:00:03.419728
Run 1, Statement 2: 00:00:01.083941

Run 2, Statement 1: 00:00:03.404954
Run 2, Statement 2: 00:00:01.098404

Run 3, Statement 1: 00:00:03.425725
Run 3, Statement 2: 00:00:01.093883

Run 4, Statement 1: 00:00:03.441691
Run 4, Statement 2: 00:00:01.127837

Run 5, Statement 1: 00:00:03.420172
Run 5, Statement 2: 00:00:01.097943

This is wonderful news! Wait, does this work well for regular correlated subqueries? Because the above LATERAL A correlation subquery can be rewritten as:

SELECT 
  t.*, 
  (
    SELECT count(*)
    FROM u
    WHERE t.j = u.j
  ) j
FROM t;

Unfortunately, the program does not feature memorization:

|QUERY PLAN                                                                   |
|-----------------------------------------------------------------------------|
|Seq Scan on t  (cost=0.00..441693.00 rows=100000 width=16)                   |
|  SubPlan 1                                                                  |
|    ->  Aggregate  (cost=4.39..4.40 rows=1 width=8)                          |
|          ->  Index Only Scan using uj on u  (cost=0.29..4.38 rows=5 width=0)|
|                Index Cond: (j = t.j)                                        |

And the index (you can paste the query into the index logic yourself to recover) than there is no memorization effect

Run 1, Statement 1: 00:00:03.617562
Run 1, Statement 2: 00:00:03.605765

Run 2, Statement 1: 00:00:03.610084
Run 2, Statement 2: 00:00:03.682064

Run 3, Statement 1: 00:00:03.725952
Run 3, Statement 2: 00:00:03.705622

Run 4, Statement 1: 00:00:03.672669
Run 4, Statement 2: 00:00:03.644612

Run 5, Statement 1: 00:00:03.645741
Run 5, Statement 2: 00:00:03.642717

It seems that with this new feature, it will be possible to rewrite correlated subqueries to external nested loop connections in the future? Other optimization campaigns are already doing this, and we’ll actually have the same feature here as Oracle’s scalar sub-cache.

Summary

The feature is enabled in PostgreSQL 14. Apart from additional memory consumption, which may be a small issue if the optimization is incorrect and the statistics are off, I do not see any drawback in this new feature. SQL is a 4GL free (in theory) side effect, meaning the optimizer can replace any calculation with a cache value that depends only on the input values ​​of the calculation.

A correlated subquery is a function, whose input parameters are the predicates and other references to the external query columns. As such, the result of a correlation subquery can be cached, or memory can be preserved. As shown above, this has drastic effects on your day’s SQL queries, which you can only gain by upgrading to PostgreSQL 14.

Source

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Popular Articles