Merge Join Support In GPORCA

September 30, 2019

Introduction

Pivotal’s SQL Optimizer, GPORCA, handled full outer joins by creating a union of a left outer join and a left anti-semi join, making any GPORCA generated plan slow and prevented GPORCA from creating better plans. In this blog post, we will look at what merge joins are, how we implemented them in GPORCA, and the resulting improvement.

What are Full Outer Joins?

A full outer join (FOJ), or full join, is a join that returns all the rows from both tables regardless of whether or not there is a match. For example, say you have the following:

CREATE TABLE foo (a int, b int) AS (VALUES (1, 3), (2, 4));
CREATE TABLE bar (c int, d int) AS (VALUES (1, 2), (4, 3));

Then

SELECT * FROM foo FULL JOIN bar ON a = c;

Would give the following results:

Introduction to Full Outer Joins in the Query Optimizer

There are a few ways of creating full outer joins in an optimizer. Currently since GPORCA does not have any native full join operator, GPORCA creates a union of a left outer join and a left anti-semi join. One such plan can be seen below.

For the rest of this blog post, we will use the following tables that each have a million rows:

CREATE TABLE t1 (a int, b int);
CREATE TABLE t2 (c int, d int);
INSERT INTO t1 SELECT i, i%1000 + 2 FROM generate_series(1,1000000)i;
INSERT INTO t2 SELECT i%10000, i%2000 +1 FROM generate_series(1,1000000)i;

Original GPORCA Generated Plan

EXPLAIN SELECT count(*) FROM t1 FULL JOIN t2 ON a = c;
                                                  QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=0.00..2905.62 rows=1 width=8)
   ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..2905.62 rows=1 width=8)
         ->  Aggregate  (cost=0.00..2905.62 rows=1 width=8)
               ->  Sequence  (cost=0.00..2905.62 rows=796818 width=1)
                     ->  Shared Scan (share slice:id 1:0)  (cost=0.00..459.38 rows=333334 width=1)
                           ->  Materialize  (cost=0.00..459.38 rows=333334 width=1)
                                 ->  Seq Scan on t1  (cost=0.00..437.97 rows=333334 width=34)
                     ->  Sequence  (cost=0.00..2445.44 rows=796818 width=1)
                           ->  Shared Scan (share slice:id 1:1)  (cost=0.00..459.38 rows=333334 width=1)
                                 ->  Materialize  (cost=0.00..459.38 rows=333334 width=1)
                                       ->  Seq Scan on t2  (cost=0.00..437.97 rows=333334 width=34)
                           ->  Append  (cost=0.00..1985.26 rows=796818 width=1)
                                 ->  Hash Left Join  (cost=0.00..993.09 rows=663485 width=1)
                                       Hash Cond: (share0_ref2.a = share1_ref2.c)
                                       ->  Shared Scan (share slice:id 1:0)  (cost=0.00..434.21 rows=333334 width=4)
                                       ->  Hash  (cost=434.21..434.21 rows=333334 width=4)
                                             ->  Shared Scan (share slice:id 1:1)  (cost=0.00..434.21 rows=333334 width=4)
                                 ->  Result  (cost=0.00..991.37 rows=133334 width=1)
                                       ->  Hash Anti Join  (cost=0.00..991.24 rows=133334 width=1)
                                             Hash Cond: (share1_ref3.c = share0_ref3.a)
                                             ->  Shared Scan (share slice:id 1:1)  (cost=0.00..434.21 rows=333334 width=4)
                                             ->  Hash  (cost=434.21..434.21 rows=333334 width=4)
                                                   ->  Shared Scan (share slice:id 1:0)  (cost=0.00..434.21 rows=333334 width=4)
 Optimizer: Pivotal Optimizer (GPORCA) version 3.74.0
(24 rows)
Time: 37.589 ms

SELECT count(*) FROM t1 FULL JOIN t2 ON a = c;
  count
---------
 1990001
(1 row)
Time: 1412.675 ms

This plan generated by GPORCA takes a total of 1413 milliseconds in execution, which is quite long for a simple full outer join.

Implementing Merge Join support in ORCA

GPDB has native implementations for full outer joins, one of them is a merge full outer join. Since GPORCA did not have any native full join operator, the first step was to add the merge join operator to GPORCA, allowing such a plan to be generated. Such an operator requires quite a few things to consider. One such thing is that in order to use a merge join, both the inner and outer tables need to be sorted.

Performance Improvements

Now that merge joins can be generated in GPORCA, we see quite a bit of improvement when generating full outer join plans.

Merge Join GPORCA Generated Plan

EXPLAIN SELECT count(* FROM t1 FULL JOIN t2 ON a = c;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Aggregate  (cost=0.00..1269.62 rows=1 width=8)
   ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..1269.62 rows=1 width=8)
         ->  Aggregate  (cost=0.00..1269.62 rows=1 width=8)
               ->  Merge Full Join  (cost=0.00..1269.62 rows=796818 width=1)
                     Merge Cond: (t1.a = t2.c)
                     ->  Sort  (cost=0.00..579.15 rows=333334 width=4)
                           Sort Key: t1.a
                           ->  Seq Scan on t1  (cost=0.00..437.97 rows=333334 width=4)
                     ->  Sort  (cost=0.00..579.15 rows=333334 width=4)
                           Sort Key: t2.c
                           ->  Seq Scan on t2  (cost=0.00..437.97 rows=333334 width=4)
 Optimizer: Pivotal Optimizer (GPORCA) version 3.74.0
(12 rows)
Time: 30.218 ms

SELECT * FROM t1 FULL JOIN t2 ON a = c;
  count
---------
 1990001
(1 row)
Time: 413.392 ms

Notice that instead of creating a union of a left outer join and a left anti-semi join, GPORCA now uses a merge join on two sorted tables in its plan.

The query optimization time reduced to 30 milliseconds from 38 milliseconds, which is negligible, but the execution time also decreased from 1413 milliseconds to 413 milliseconds. The execution time is around 3.5x faster than how GPORCA was performing before merge join support.

Full Join Optimizations

Full joins can be optimized into left, right, or inner joins depending on the predicates present in the WHERE clause of the query. GPORCA can greatly take advantage of this by reducing a full join as early as possible in order to create better plans. The same tables t1 and t2 above are used to showcase the following optimizations.

Optimization 1: Full Outer Join → Left Join

In order to be able to convert into a left join, a WHERE clause needs to exist in the query where a predicate on the outer table is null rejecting. Null rejecting means that the predicate will never be true when the referenced column from the outer table is NULL.

For example, EXPLAIN SELECT * FROM t1 FULL JOIN t2 on a = c WHERE a > 2 is null rejecting because we do not return rows where a is NULL and therefore we eliminate any of the NULL-extended rows from foo. This query could then be simplified into a left join.

The conversion of a full join to a left join is done during exploration and allows for GPORCA to then optimize the query using both the full join and left join alternatives.

The above query was run on two tables: t1 had 10 million rows, t2 had 1. We can see that the execution time decreased from 6234 ms to 428 ms, around a 15x improvement.

Optimization 2: Full Outer Join → Inner Join

Similarly, if there exists a predicate where both the right side and left side tables are null-rejecting, then the FULL join can be converted into an inner join. This optimization is actually a by-product of the first, since full joins can be converted into left joins, and left joins can be optimized into inner joins. It is possible for a full join to be optimized into an inner join as well.

Even in the simplest query, this provides a great improvement for the execution time. Here we can see that the execution time decreased from 6659 milliseconds to 362 milliseconds, resulting in a performance gain of around 20x.

Something to note for both optimizations is that previously GPORCA was producing wrong results. Along with these optimizations, GPORCA now produces the correct results for the given query.

Future Work

While merge join is now supported in GPORCA, optimizations can still be made. Future work would include working on an implementation of hash full join. Similar to the conversion of a full join to a left join, we could also convert a full join to a right join.

Conclusion

Merge joins are a great first start to improving full outer join optimization in GPORCA. Originally, this was handled sub-optimally as GPORCA did not have support for native full join operators, making any GPORCA generated plan slow, even in the simplest of queries. With the support of merge join in GPORCA, we not only improved execution time, but also optimization time by 4x. Additionally, we were able to optimize full outer joins by converting them into left outer joins and inner joins depending on the WHERE clause of the query. These small optimizations can improve a full join query by at least 7x compared to the original plan.

ESM Study Data

Studying the experience of XP Teams This is the raw data (after anonymization, and after the removal of fre...

Ensure Windows apps are using the `windows` stack

Pivotal Application Service for Windows introduced the -s windows stack name in PASW 2.4, reducing the oper...

Merge Join Support In GPORCA

Introduction

What are Full Outer Joins?

Introduction to Full Outer Joins in the Query Optimizer

Original GPORCA Generated Plan

Implementing Merge Join support in ORCA

Performance Improvements

Merge Join GPORCA Generated Plan

Full Join Optimizations

Optimization 1: Full Outer Join → Left Join

Optimization 2: Full Outer Join → Inner Join

Future Work

Conclusion

Previous

Next

Merge Join Support In GPORCA

Introduction

What are Full Outer Joins?

Introduction to Full Outer Joins in the Query Optimizer

Original GPORCA Generated Plan

Implementing Merge Join support in ORCA

Performance Improvements

Merge Join GPORCA Generated Plan

Full Join Optimizations

Optimization 1: Full Outer Join → Left Join

Optimization 2: Full Outer Join → Inner Join

Future Work

Conclusion

Previous

Next

Most Recent

When writing a Java Spring web application that uses an OAuth2 single sign-on (SSO) service for authentication, testing can be difficult, especially if the SSO service is provided by a third...

My co-worker Belinda Liu turned to me and said, “I don’t like these tests at all; they’re hard to follow, and I’m not sure what they’re testing.” I looked at the tests that I had spent much of...

0. Abstract HAProxy is an optional load balancer included in the canonical open source Cloud Foundry deployment. Its intended use is on IaaSes (Infrastructures as a Service) that do not offer...

Scaling the Loggregator API So you’ve used this article to correctly scale Dopplers in your Loggregator system. Even so, you notice that you’re still experiencing log loss. It could be that your...

Why care about Dopplers You might be wondering what a Doppler is (and why you care about it). Doppler VMs are a core component of log and metrics transport; one that you probably won’t care about...

Studying the experience of Pair Programmers This is the raw data (after anonymization, and after the removal of freeform fields, out of an abundance of caution, so as not to leak any intellectual...

Studying the experience of XP Teams This is the raw data (after anonymization, and after the removal of freeform fields, out of an abundance of caution, so as not to leak any intellectual...

Pivotal Application Service for Windows introduced the -s windows stack name in PASW 2.4, reducing the operator and developer need to concern themselves with specific Windows Server versions. From...

Abstract “How much faster will my VM’s disks be if I upgrade my ZFS-based (Z File System) NAS to 10 GbE?” The disks will be faster, in some cases, much faster. Our experience is that sequential...

The Spring framework has grown and changed at a massive pace over the last years. It has evolved from XML configured beans to annotation based beans, from synchronous to a non-blocking and...

Overview In a previous post I explained how you could create several components to build a Netflix stack for local development. Now, I want to explain how Pivotal Cloud Foundry makes this much...

Overview A couple of recent projects I have been on have started our engagement with the Netflix stack described here, and because I wanted to have a way to quickly prototype, I set up this demo. ...

Introduction One way to extend the Kubernetes platform is by building custom controllers that operate on custom resources. We can leverage custom resources to enhance the cluster with features for...

Introduction The kubelet exposes many useful metrics that can be used for a variety of purposes. These metrics are already being scrapped by components like the Metric Server. The metrics from the...

Abstract Smartphone authenticator apps such as Google Authenticator and Authy implement software tokens that are “two-step verification services using the Time-based One-time Password Algorithm...

(This blog is the fourth installment of a four-part series) The Operator Pattern The Operator Pattern stipulates a process that is registered with the Kubernetes system layer, listening to...

(This blog is the third installment of a four-part series) Kubernetes can automatically provision “remote persistent” volumes with random names Several types of storage volumes have built-in...

(This blog is the second installment of a four-part series) By default, all containers are free from limits but subject to eviction By default, Kubernetes places very few limits on a container. A...

Kubernetes is available across all public clouds nowadays, including Pivotal’s own PKS, which runs in the cloud and can also be run “on prem”, on the premises of an enterprise. Kubernetes promises...

Abstract By using tcpdump to troubleshoot an elusive error, we uncovered a man-in-the-middle (MITM) ssh proxy installed by our information security (InfoSec) team to harden/protect a set of...