Log In Sign Up

Break Up the Pipeline Structure to Reach a Nearly Optimal End-to-End Latency

by   Junyi Zhao, et al.
Tsinghua University

Query optimization is still problematic in the commercial database system because database optimizers sometimes choose a bad execution plan with several-fold latency differences. In this paper, we de-sign a new dynamic optimization strategy called query split, which takes advantage of run-time statistics. Integrating query split into PostgreSQL, we have a 2x speedup in total end-to-end latency on Join Order Benchmark and achieve near-optimal latency by comparing with the optimal execution plan. Our finding reveals that breaking up the static pipeline between database optimizer and executor can benefit the query processing and greatly reduce end-to-end latency.


page 1

page 2

page 3

page 4


How I Learned to Stop Worrying and Love Re-optimization

Cost-based query optimizers remain one of the most important components ...

Plan-Structured Deep Neural Network Models for Query Performance Prediction

Query performance prediction, the task of predicting the latency of a qu...

Towards a Hands-Free Query Optimizer through Deep Learning

Query optimization remains one of the most important and well-studied pr...

Multi-agent Databases via Independent Learning

Machine learning is rapidly being used in database research to improve t...

Change Propagation Without Joins

We revisit the classical change propagation framework for query evaluati...

Bao: Learning to Steer Query Optimizers

Query optimization remains one of the most challenging problems in data ...

ARENA: Towards Informative Alternative Query Plan Selection for Database Education

A key learning goal of learners taking database systems course is to und...

1. Introduction

Although query optimization is a decades-old topic, current optimizers still sometimes make sub-optimal plans due to cardinality estimation error. Cardinality estimation is essential for query optimizers because the cost calculation relies on the estimation of the relation size. However, due to some coarse assumptions made by the database system, cardinality estimation could have a bad performance in real workloads, resulting in sub-optimal plans.

Many attempts have been made in literature to address the issue of cardinality estimation error. One approach is to record the correlation between attributes, using data structures like multidimensional histogram (Gunopulos et al., 2005). Another approach is to make a better query plan with the help of queries that have been executed before (Sun and Li, 2019; Stillger et al., 2001; Kipf et al., 2018; Heimel et al., 2015; Park et al., 2020; Dutt et al., 2019; Marcus et al., 2019). However, none of these approaches are completely satisfying (Perron et al., 2019).

As it is difficult to solve the cardinality estimation problem, reoptimization (Kabra and DeWitt, 1998; Babu et al., 2005; Perron et al., 2019) is proposed to avoid the need for accurate cardinality estimation. Reoptimization first generates an initial execution plan at the beginning, and then detects during runtime whether the actual behavior of a query plan becomes significantly different from what was expected. Reoptimization attempts to correct the execution plan whenever a significant deviation is found.

We use a high-level concept called ”dynamic query optimization” to summarize the key characteristic of reoptimization that its execution plans would change during actual execution. In dynamic query optimization, the query optimization and execution are interleaved. As feedback from the executor comes in, more information becomes available, and the optimizer can make better execution plans accordingly. Reoptimization only triggers the interleaving of query optimization and execution reactively (Babu and Bizarro, 2005), and is not fully exploring this field. Other attempts can be made in dynamic query optimization, for instance, by interleaving query optimization and execution proactively.

Three reasons make dynamic query optimization a promising approach for generating the optimal execution plan. First, compared to the traditional static pipeline, dynamic query optimization can improve query optimization by knowing more about run-time statistics. Second, dynamic query optimization can be implemented easily in the current RDBMS architecture, without model training, extra storage overhead, or significant change to the optimizer and executor. Third, dynamic query optimization and cardinality estimation techniques can benefit each other. Dynamic query optimization decreases the problem size of cardinality estimation. Meanwhile, advanced cardinality estimation technique improves the plan made by dynamic query optimization.

In this paper, we establish a new dynamic query optimization strategy called query split. For selection-projection-join (SPJ) quer-ies, we split the query into several subqueries and execute them sequentially to get the result of the original plan. By materializing the subquery results, we get more precise statistics about the original query, which is used in later subquery optimization to make a better execution plan.

Query split has two characteristics. First, unlike reoptimization that needs to be triggered, query split is a proactive strategy. Second, query split only remakes execution plan between the execution of different subqueries. As a result, query split decreases the frequency of calling optimizer, hence reducing the overhead. Our experiment shows that query split outperforms PostgreSQL on Join Order Benchmark (Leis et al., 2018) and gets nearly optimal end-to-end latency.

The rest of the paper is organized as follows. Section 2 motivates the dynamic query optimization and query split. Section 3 formally describes query split and prove its correctness. Section 4 shows preliminary implementations of query split. Section 5 evaluates the performance of query split. Section 6 discusses how to extend query split to general query forms. Section 7 gives a case study on query split for a deep insight. Section 8 discusses related works, and Section 9 concludes this paper. We leave some details about experiment and discussion in our technical report (Junyi and Yihan, 2022).

2. Motivation

In this section, we characterize a fundamental limitation of the traditional query processing pipeline and use it as motivation to introduce a novel query processing framework called query split.

  • [leftmargin = 15pt]

  • We first review the problem of cardinality estimation and explain why it is hard to accurately estimate cardinality of complex query result (in Section 2.1).

  • Then we review an existing query processing technique called reoptimization, which circumvents the need for accurate cardinality estimation. Reoptimization dynamically adjust the que-ry execution plan based on runtime statistics, thus resulting in an improved overall performance (in Section 2.2).

  • Next, we discuss the underlying philosophy of reoptimization, and generalize it to a new concept called dynamic query optimization, in which the steps of query optimization and execution are interleaved and integrated together (in Section 2.3).

  • Last, we propose a new dynamic query optimization strategy called query split, and informally describe it and discuss its characteristics (in Section 2.4).

2.1. Intrinsic Difficulty of Cardinality Estimation

Currently, most database systems estimate cardinality as the product of three terms: size of both relations and predicate selectivity, in which the selectivity is estimated via table statistics (Silberschatz et al., 1997). For example, in PostgreSQL, the selectivity of a simple predicate on base tables is estimated using gathered statistics such as histograms, most common values and their frequencies, and the number of distinct values. For conjunctive predicates, it is commonly assumed that the component predicates are independent, so the final selectivity is the product of the selectivity of each predicate (Leis et al., 2018). For complex join queries, independence between join predicates is often assumed, and their selectivities are multiplied together. In MySQL, the cardinality estimation strategy is very similar (MySQL, 2020).

There are two major reasons why it is hard to guarantee the accuracy of cardinality estimation (Kabra and DeWitt, 1998). The first reason is that optimizer lacks the statistics of intermediate relations. We use an example to demonstrate why this is a problem that troubles the optimizer.

Example 1 ().

A database contains two relations and and some statistics in the system catalog including the cardinality of two relations and and the value distribution of attributes , and . Given a natural join query , the cardinality of is given by:

where is the domain of attributes .

However, when we apply a filter to attribute , cardinality estimation of becomes more difficult. Consider the query , and denote the intermediate relation as . So that the query can be rewritten as , and the cardinality of is . Since any other statistic about is unknown, the optimizer must approximate those missing statistics by some assumptions. For example, if we assume independence between attributes and in table , we can get , which can then be used to estimate the selectivity of join predicate. However, those assumptions may not be actually valid, and would introduce error in estimation.

Similar situation happens when the filter is replaced by a new join, e.g. . Since we only have the statistics of base relations, to estimate the cardinality of the final result , we have to assume independence across joins. Specifically, we use the selectivity of joins between base relations to approximate the selectivity of joins containing intermediate results, which also leads to estimation error.

From the above example, we can see that the optimizer has to make assumptions to make up for the lack of statistics, which inevitably generates estimation error.

The second reason is error propagation. Ioannidis and Christod-oulakis mathematically deduce that the cardinality estimation error of an N-way natural join grows exponentially with N, under the assumption that there is some error in estimating the distribution of join attribute in each relation (Ioannidis and Christodoulakis, 1991). Hence, except for small queries, cardinality estimation results are often not trustworthy. Although these results are based on theoretical analysis, they are also generally valid in practice based on our empirical observations during experiments.

2.2. Reoptimization

As it is hard to make cardinality estimation always accurate, Kabra and DeWitt proposed reoptimization (Kabra and DeWitt, 1998) to circumvent this problem. Reoptimization collects statistics during runtime to monitor query execution. When the collected statistics deviate too much from prediction, the system replans the remainder of the query plan in light of this new information, which can alter join orders and physical operators of the remaining part.

Reoptimization does not make drastic changes to the query execution engine. Instead, it detects during runtime whether the actual behavior of a query plan becomes significantly different from what was expected and attempts to correct the execution plan wh-enever it happens. In other words, reoptimization admits that the query optimizer can make mistakes in decisions and thus tries to alleviate them. In a recent work, Perron et al. (Perron et al., 2019) evaluated the performance of reoptimization on the Join Order Benchmark (Leis et al., 2018), and their result suggests that reoptimization can significantly reduce query execution time.

Reoptimization makes use of the statistic of intermediate results, and in this paper we refer them as runtime statistics, to distinguish them from statically collected statistics. By using runtime statistics, the optimizer improves cardinality estimation from two aspects: first, by replacing estimated statistics with runtime statistics, the optimizer reduces the cardinality estimation error caused by problematic assumptions. For example, the runtime statistics can include the actual distribution of attributes in the intermediate result, which leads to more accurate cardinality estimation. Another aspect is that runtime statistics stop error propagation. Optimizer uses runtime statistics to replace the estimated cardinality, which essentially reduces the size of join that needs to be estimated.

2.3. Dynamic Query Optimization

The general idea of changing the plan halfway through execution actually leads us to a broader research space. Reoptimization is a reactive technique as it is triggered only when we find estimation being wrong. Alternatively, we can also proactively change the execution plan, and call the query optimization routine any time to potentially change the plan. Besides, there are other interesting research questions, like when and what statistics should we gather during runtime. For example, whether we should collect statistics at each node in the join tree or only at some of these nodes.

We introduce a high-level concept called dynamic query optimization to characterize this new research space. As shown in Figure 1, in a dynamic query optimization scheme, the optimizer initially provides an execution plan to be executed. Then, either during or after the execution of plan, we switch back to the optimizer, with some new intermediate results and runtime statistics. With such information, the optimizer can either keep the original plan or produce another plan to be executed. This iteration can happen several times until the final result is ready. Dynamic query optimization is fundamentally different from the traditional static query optimization, in which the query plan is fixed during the whole procedure.

Dynamic query optimization can be classified as a specific case of the more general concept of adaptive query processing

(Babu and Bizarro, 2005), in which query execution strategy changes adaptively based on the actual state during execution. Most existing adaptive query processing techniques keep the strict order between optimizer and executor. For example, parametric query optimization (Ioannidis et al., 1997) and Rio (Babu et al., 2005) give a set of possible optimal plans to the executor instead of giving one single optimal plan and the plan can be switched during execution. Eddies (Avnur and Hellerstein, 2000) generate a basic join tree by a simple pre-optimizer and can dynamically change the tuple pipeline in join operators during execution.

Compared to other adaptive query processing techniques, dynamic query optimization emphasizes the interleaved steps of que-ry optimization and execution. The optimizer only determines the initial few steps of the query plan at beginning and gradually decide the next steps as more information becomes available. This direction so far has not been thoroughly explored by existing works on adaptive query processing.

Figure 1. Static and dynamic query optimization

In tradition structure, there is a unidirectional pipeline from optimizer to executor; and in breakup pipeline structure, there is a circle instead.

2.4. Query Split

Calling optimizer again to modify the execution plan leads to nontrivial overhead, especially if done very frequently. In this paper, we propose a new dynamic query optimization strategy called que-ry split which has lower overhead comparatively. In query split, we execute a small part of query each time and materialize its result until the entire query has been executed. We provide an example to demonstrate how query split works in practice.

Example 2 ().

As shown in Figure 2(a), we consider a query on five relations. First, we split query into three subqueries, as shown in Figure 2(b): , , and . Then, we execute and materialize the result as temporary table . Next, we replace and in , by , resulting in , and , as shown in Figure 2(c). Afterwards we execute the modified version of , and materialize its result as . Finally, as shown in Figure 2(d), we replace in by to get , and then replan and execute it to obtain the result.

(a) Join Graph of
(b) with subqueries , ,
(c) after executing
(d) after executing and
Figure 2. query split example

More generally, query split works as follows. First, for a given query, we split it into several small subqueries. Then we compute the result of the original query using these subqueries:

  • [leftmargin = 15pt]

  • Each time, choose a subquery to execute and then remove it from the set of subqueries.

  • After running a subquery and collecting runtime statistics, modify other subqueries and remake their execution plans.

  • When no subquery is left, merge some subqueries’ results to obtain the result of the original query.

Query split has the following two characteristics:

  • [leftmargin = 15pt]

  • Unlike reoptimization, which needs to be triggered, query split proactively calls optimizer to change the plans of other subqueries after executing a subquery.

  • Query split splits query into several small subqueries and only remakes execution plan of relevant subqueries after the execution of each subquery. By using the above approach, query split decreases the frequency of plan modification, hence reducing the overhead of query optimization.

Query split is a new attempt for dynamic query optimization, we hope to use this work to draw more attention and research to this direction. Our new strategy opens a fresh perspective to the query optimization problem. As shown in Section 5, query split outperforms the build-in PostgreSQL optimizer by a large margin, and it is a promising approach that reached near-optimal execution time in the Join Order Benchmark.

3. Formulation & Definition

In this section, we formalize query split. We first define the concept of selection-projection-join query and formally describe query split for such queries (in Section 3.1). Then, we design an implementation for reconstruction algorithm, which is one of the components of query split, and prove its correctness (in Section 3.2).

3.1. Framework Overview

In this section, we give an overview of query split framework. For simplicity, we temporarily restrict our attention to selection-proj-ection-join (SPJ) queries in the following three sections, which only involve selection, projection, and join operators. These three operators are chosen because they can be connected together to compose most of the basic SQL SELECT queries, covering all the queries in the Join Order Benchmark (Leis et al., 2018). Query split can be extended to handle more general queries, which is discussed in Section 6.

First, let us define a normal form of SPJ queries. Then by relational algebra manipulation, every SPJ queries can be transformed into the following normal form, in which we denote by R a set of relations, S a set of selection predicates over R and P a set of projection attributes, via equivalence rules (details can be found in Appendix A).

Now, we can formally describe query split. The framework consists of five components: query splitting algorithm, runtime statistics collector, query optimizer, query executor, and the reconstruction algorithm. The relationship between five components are shown in Figure 3. Query splitting algorithm takes the global query as input and splits it into subqueries, and then we obtain the result via the following steps:

  1. [leftmargin = 15pt]

  2. In each iteration, the reconstruction algorithm picks a subqu-ery and sends it to the query optimizer.

  3. The query optimizer makes an execution plan and delivers it to the query executor.

  4. The query executor executes the plan, materialize the results and sends them to statistic collector.

  5. The runtime statistic collector obtains new statistics, which can help the optimizer make better plans for the remaining subqueries in the following iterations.

  6. After all subqueries have been executed, the reconstruction algorithm reconstructs the final query result relation from subquery results.

Figure 3. The framework of query split

Note that statistics collector, query optimizer, and query executor are existing components in RDBMS. In the following, we define query splitting algorithm and reconstruction algorithm, the two new concepts that we propose in this paper.

We first formalize the concept of subquery and query splitting algorithm, in which we stipulate that subqueries do not contain projection operations. The definition of query splitting algorithm here is abstract, and we will discuss possible concrete implementations in Section 4.1.

Definition 1 ().

Given two queries q(R,S,P) and qRS. If RR and SS, then q is said to be a subquery of query q. Query splitting algorithm is an algorithm which takes a SP query q as input and returns a set of subqueries of q.

Next, we define the concept of reconstruction algorithm.

Definition 2 ().

Reconstruction algorithm is an algorithm that ta-kes a set of subqueries Q and a set of projection attributes P as input and returns a reconstructed relation. A reconstruction algorithm A is correct with respect to a query splitting algorithm B if, for every SP query q(R, S, P), the reconstructed relation is always equal to the result of the original query, if we feed the output of B on q together with P as input of A.

Clearly, not every query splitting algorithm has a corresponding reconstruction algorithm. A crucial question here is under which conditions a correct reconstruction algorithm exists. To answer this question, we define the concept of subquery cover.

Definition 3 ().

Given a SP query q(R, S, P), let Q=, …, be a set of subqueries of q. We denoted R(Q)= and S(Q)=. Q is said to cover (Qq) if,

(1) R(Q)=R

(2) S(Q) logically implies S. 111“A logically implies B” means that each predicate from B can be inferred by A.

Intuitively, the reconstruction can succeed only if there is no information loss between the set of subqueries and the original query, and the subquery cover concept formalizes this intuition. To ensure a correct reconstruction algorithm exist, the set of subqueries Q returned by a query splitting algorithm with query q needs to satisfy Qq. And we give a simple reconstruction algorithm that guarantees correctness under such conditions in the following section.

3.2. Replacement Reconstruction

In this section, we propose a concrete reconstruction algorithm called replacement reconstruction, which is correct with respect to every query splitting algorithm that are guaranteed to cover the original query. The correctness of this algorithm will be proved later.

Figure 4. Replacement reconstruction

Figure 4 shows a flow chart that explains how replacement reconstruction algorithm works. The algorithm takes the set of subqueries Q and the set of projection attributes P as inputs, and use the following steps to reconstruct the query result.

1:Q a set of subqueries, P a set of projection attributes
2:function ReplacRecon(Q, P)
3:     L
4:     while Q  do
5:         * use QSELECT() to choose a subquery from Q *
6:          QSELECT(Q)
7:         * EXEC() executes subquery and materialize results *
8:          EXEC()
9:         * STAT() collects the runtime statistics*
10:          STAT()
11:         Q Q
13:         for  Q do
14:              if  then
15:                  * replace subset in with *
16:                   replac(, , )
18:              end if
19:         end for
20:         if  then
21:              L L
22:         end if
23:     end while
24:      L
25:     return
26:end function
Algorithm 1 Replacement Reconstruction
  • [leftmargin = 15pt]

  • Initial: initialize the set of subquery results L as empty set.

  • Loop(execute): execute a subquery from Q and materialize the result as , then collect runtime statistics.

  • Loop(modify): remove from Q, then pick out all subqueries whose set of relations intersects with , modify these subqueries via a specific protocol that will be described later replac. If there is no such , add to L. After this, check whether Q is empty and repeat the loop if not.

  • Merge: when Q is empty, compute the Cartesian product of all relations in L and project on the result by P, and the end result is the reconstructed relation.

It remains to describe how replac works. It actually modifies the subquery from two aspects.

  • [leftmargin = 15pt]

  • First, replace all relations in by . In other words, set the set of relations of as .

  • After replacing old relations from with , some predicates in would be referencing removed relations. Update those predicates by using the corresponding attributes in instead.

Now we can prove the correctness of replacement reconstruction algorithm.

Theorem 0 (1).

Let q(R, S, P) be an SP query, Q be a set of subqueries of q. If Qq, then the output of the replacement reconstruction algorithm is equal to the result of q.

This implies that the replacement reconstruction algorithm is a correct reconstrcution algorithm for any query splitting algorithm that guarantees its output to cover the original query. The proof of this theorem can be found in Appendix B.
Remark: In Definition 1, we assume that there is no projection operation in subqueries for simplicity. In practice, we can push down the projection operation to each subquery to effectively reduce the size of the temporary relation and minimize execution time. In principle, any attribute which doesn’t appear in other subqueries can be safely projected away.

4. Detail Implementation

Although we formally described the query split in Section 3, there are still some components that are relatively flexible. In this section, we examine these components in more detail, and provide some concrete implementations: the query splitting algorithm (in Section 4.1) and the criterion for selecting which subquery to execute each time (in Section 4.2). However, since our paper aims to verify the general effectiveness of query split, an in-depth research of these components is beyond the scope of this paper. The contents of this section only serve as some preliminary designs for reference.

4.1. Query Splitting Algorithm

In this section, we propose three simple query splitting algorithms (MinSubquery, RelationshipCenter, and EntityCenter).

4.1.1. MinSubquery

This is the most straightforward query splitting algorithm, in which we split the query into minimal subqueri-es. For each selection predicate that spans more than one relation, we use those relations to construct a relation set . Then we construct the set of selection predicates to contain all the selection predicates over . We use an example query (shown in Figure 5(a)) to demonstrate the procedure of MinSubquery below.

Example 3 (MinSubquery).

As shown in Figure 5(a), there are five join predicates (denoted by , , , and ) and three selection predicates over relation , , and (denoted by , and ). To create subqueries, first construct the following relation sets involved in join predicates: , , , , . Then, for each relation set , is equal to the union of all selection predicates in the original query over . As shown in Figure 5(b), this results in five subqueries , , , and .

(a) The example query
(b) Subqueries
Figure 5. Join graph split by MinSubquery

4.1.2. RelationshipCenter

Because subqueries in query split are always materialized, it would be beneficial to attempt to constrain the size of subquery result when constructing the subqueries. There are several potential benefits for doing this: first, a small subquery result can speed up the later execution, as the completed subqueries will often be used in other subqueries. Another benefit is that it improves the robustness of the framework by using less memory. In query split, each subquery result must be materialized in memory after execution, and hence subqueries with bounded result sizes lead to more stable memory usage.

In MinSubquery, any join predicate can be use as a foundation for creating subqueries, and subqueries created this way are often translated to general theta-joins. The result size of such subqueries can not be effectively bounded, and in the worst case the size can be as large as the product of the size of all join relations. In contrast, when one relation contains several foreign keys to other relations, and they are joined together via the foreign keys, the result size doesn’t exceed the size of the first relation. Here we make use of this fact to develop a new query splitting algorithm.

We introduce two new concepts: entity relation and relationship relation, which are inspired by the entity-relationship diagram (Chen, 1976). If the primary key of a relation is referenced by another relation as foreign key, we denote such relations as entity relation. For relations that do not have a primary key or their primary keys are not referenced by any other relation, we denote them as relationship relation.

With the above definitions, we can see that when a relationship relation joins with several entity relations via foreign-key joins, the cardinality of the join result will not exceed the size of the relationship relation. Based on this, we propose the RelationshipCenter algorithm to control the size of each subquery. The algorithm ensures that most subqueries contain one relationship relation and multiple entity relations. The concrete steps of RelationshipCenter are shown as follows.

  • [leftmargin = 15pt]

  • First, we classify all relations into two types: relationship and entity by the above definitions.

  • Next, we remove redundant selection predicates, and we prefer to reserve foreign-key joins as much as possible.

  • Then, we construct a directed graph for the given query, in which each vertex represents a relation. For each foreign-key join predicate, we create an edge from the relation which contains the foreign key to the relation where primary key comes. For every other selection predicate that spans multiple relations, we consider all possible pairs from the involved relations, and create one or two edges between them, depending on their relation types: if both relations are the same type, we create two edges in both directions. Otherwise, we create an edge from relationship relation to entity relation.

  • Last, for each relation that connects to other relations, we create a subquery based on this relation. The relation set of the subquery contains the given relation and all relations it links to. The selection predicate set contains all predicates over the relation set.

Example 4 demonstrates how RelationshipCenter works.

(a) Global query
(b) Subuqeries
Figure 6. Join graph split by RelationshipCenter
Example 4 (RelationshipCenter).

As shown in Figure 6(a), the query consists of one non-foreign key join () and four foreign key joins (, , and ). Hence, relation and are relationships and relation , , are entities. Then as shown in Figure 6(b), we first remove because it is redundant, then draw a directed graph with four edges (, , and ), corresponding to join predicates. Next, consider all relations: relation , and don’t connect to any other relation, hence we do not create subqueries for them; relation connects to and , resulting in subquery ; relation connects to and , and we have subquery .

4.1.3. EntityCenter

If we reverse the direction of edges in the directed graph in RelationshipCenter, we can get another query splitting algorithm. In this algorithm, the centers of subqueries are entity relations, and we name this algorithm EntityCenter.

4.2. Execution Order Decision

In this section, we propose some feasible criteria to decide which subquery to execute each time. Although we have proved that execution order does not affect the correctness of query split, different execution orders can load to different execution time. Therefore, we give some feasible implementations as the candidates and evaluate their efficiency in Section 5.

4.2.1. Heuristic Rule Based Subquery Selection

One intuitive approach is to use heuristic rules to order the subqueries. We take the execution plans of subquery as input and assign a numeric value to each of them. Each time, we choose the subquery with the lowest numeric value, and return this subquery and its execution plan to execute.

We have designed some ranking functions for the algorithm in Table 1. We denote a subquery, the execution plan made by optimizer, the estimated cost of , and the estimated size of its result. The most basic approach is to use the estimated cost of subqueries for ranking. However, this approach fails to consider the future influence, since a large intermediate result will slow down future subqueries. Take this into consideration, we designed other four candidates , , and with detailed expression in Table 1.

ranking function expression
Table 1. The expression of different ranking functions

4.2.2. Global View Based Subquery Selection

Looking only at subqueries themselves from a local perspective may not always lead to a good execution order. An alternative approach is to choose subqueries from a global view, and we have designed one such algorithm called global_sel.

global_sel starts the selection process with a plan generated fr-om the original query. Then, given an execution tree for the original query, we choose the deepest operator node, find the relations involved in the operator, and select the subquery whose relation set contains those relations. If multiple subqueries can be selected, we choose one of them by an arbitrary tie-breaking rule.

Figure 7. The illustration of global_sel

Example 5 illustrates the procedure for the global view based subquery selection.

Example 5 (Global_sel).

We split the query into three subqueries , , and . As shown in Figure 7, at the beginning, the deepest non-leaf node in global plan is , and the involved relations are . Hence, subquery would be first executed because it is the only subquery whose relation set contains and . Then we materialize the subquery result as a temporary table, and call optimizer for the remainder of the original query. In this iteration, the deepest non-leaf node in modified global plan is , so we choose because its relation set contains and .

5. Experiment

As we have motivated query split and introduced all the details, we evaluate its performance on real-world workload in this section. We first describe how our experiments are organized (Section 5.1), and then introduce the baselines that we use (Section 5.2). Next, we evaluate the performance impact of different query splitting algorithms and execution order decisions (Section 5.3), and compare a fine-tuned query split implementation with baselines (Section 5.4). Last, we discuss the potential out of memory issue and its implications (Section 5.5).

5.1. Experiment Setup

Our experiments are performed on a 64-bit Windows 10 computer with an Intel Core i9-10900K CPU (3.70 GHz) and we use the source code of PostgreSQL 12.3 as basis for all of our implementations. We set max parallel workers to 0, so the CPU only uses one core during query processing. Furthermore, the system has 128 GB of RAM, and we set up the PostgreSQL parameters so that the database can store the entire IMDB database and temporary data in memory. This setting ensures that no disk IO will happen during the query execution. We have also modified the default cost model parameters of PostgreSQL to target this in-memory query execution setting. More details of the experimental setup can be found in Appendix C.

We use Join Order Benchmark (JOB) (Leis et al., 2018) as our experiment workload. JOB is a real-world workload over the IMDB dataset and has a high value for evaluating the performance of RDBMS. JOB has 113 queries, and 91 of them have non-empty results. We run these 91 queries and record their latencies as reported by PostgreSQL.

We have released the source code of query split that we used in experiments on Github. 222

5.2. Baseline

Aside from the built-in optimizer of PostgreSQL, we introduce another baseline called optimal plan, which intends to capture the latency of the optimal static query execution plan for each query. This plan is obtained by using two tricks: first, in optimal plan, the optimizer remembers the true cardinalities of all past intermediate results for a given query and use them during query optimization. Second, we encourage the optimizer to try the join orders that haven’t been explored yet by artificially decreasing the cost of these join orders. By iterating the same query several times, the optimizer will know the true cardinality of most intermediate relations, and would make a near optimal plan accordingly (we describe the detail of this baseline in Appendix D).

Another baseline that we originally considered is reoptimization (Kabra and DeWitt, 1998), which has been previously evaluated on JOB via simulation by Perron et al. (Perron et al., 2019). We implemented the reoptimization technique in PostgreSQL (we describe the details in Appendix E). The results of this baseline (denoted as “re-PostgreSQL”) have been shown in Figure 8. Notice that the end-to-end latency of “re-Postgr-eSQL” is close to the optimal result in most queries. But in query 23, 44, 61, PostgreSQL fails to output results and reports an out of memory error. Upon checking the system log, we find out that out of memory occurs due to temporary table materialization exceeding the limit of the memory buffer. This problem was not reported in the previous simulation result (Perron et al., 2019), because temporary tables were allowed to be written back to disk in the simulation, which is not a practical behavior in an actual deployment setting.

The performance of “re-PostgreSQL” illustrates that although reoptimization can make better cardinality estimation, the consumption of local memory buffer is a serious problem that is hard to avoid. To decrease memory consumption, it is necessary to control the timing of materialization, and carefully choose which intermediate results to be materialized. All these decisions are highly non-trivial, and as a result we have to give up using “re-PostgreSQL” as a baseline.

Figure 8. The latency of each query in re-PostgreSQL

5.3. Performance Impact of Different Implementations

We first evaluate the performance of different implementations of query split. The result is shown in Table 2, in which we report the total end-to-end latency of 91 queries in JOB. As we have mentioned in Section 4.1.2, some implementations of query split strategy can trigger out of memory error (we will explain reasons in Section 5.5). If we exclude all the out-of-memory queries (query 23, 24, 37, 61), we obtain a reduced JOB and the total end-to-end latency of reduced JOB is reported in the brackets for reference.

FunctionLatency(s)Strategy MinSubquery RelationshipCenter EntityCenter
3 OOM 3333 OOM: 3 times out of memory in 91 queries(378.698) 419.684(344.282) 563.618(483.831)
1 OOM(393.300) 431.257(352.363) 458.871(402.320)
1 OOM(304.802) 400.945(320.305) 457.227(399.177)
3 OOM(291.282) 345.793(262.874) 444.373(385.500)
3 OOM(309.415) 363.229(287.250) 431.816(372.144)
1 OOM(313.730) 367.031(284.895) 2 OOM(347.816)
Table 2. Total latencies for different implementations on JOB and reduced JOB

From Table 2, we can conclude that both query splitting algorithms and execution order decisions influence the end-to-end latency. Among query splitting algorithms, EntityCenter uses the longest time for every execution order. This result may be due to the huge subquery results, which apparently slow down the execution speed of the following subqueries. The result suggests us that EntityCenter is not a good choice for query splitting. Minsubquery has the best performance when the execution order is determined by the ranking function . In other cases, RelationshipCenter prevails.

Next, we observe that the end-to-end latency between five execution order decisions also changes significantly. is the worst subquery order selection strategy, as it ignores the cost, wh-ich is directly related to the execution time. has the second-worst performance among all subquery selections, which justifies our previous intuition that execution order decisions should also consider the future influence. and both perform better than , which indicates that although we need to consider the size of the subquery to be executed, it’s not proper to give it the same weight as cost.

To our surprise, although provides a global view, it does not outperform the heuristic rule based methods. Actually, the performance of is worse than for most query splitting algorithms. We investigated into this and fou-nd out that in many cases, the global plan made by the optimizer is far from the actual optimal plan. As a result, the execution order based on the global plan also performs poorly. The underlying cause of the bad global plan is the increased error of cardinality estimation with the larger query due to error propagation (Ioannidis and Christodoulakis, 1991). The result indicates that global view based subquery selection is not as effective as we consider.

According to Table 2, we choose RelationshipCenter with to represent query split strategy in the rest of paper, which has the least total end-to-end latency among all implementations.

5.4. Performance of Query Split

We further compare query split with the built-in PostgreSQL optimizer and optimal plan. Table 3 shows the total latency of PostgreSQL, query split, and optimal plan in JOB and reduced JOB, and the latency of each query is shown in Figure 9.

ImplementationLatency(s)Benchmark JOB reduced JOB
PostgreSQL 647.636 488.457
query split 345.793 262.874
optimal plan 278.687 223.677
Table 3. Total latency comparison
Figure 9. The latency of each query

Two conclusions can be drawn from Table 3 and Figure 9. First, in both JOB and reduced JOB, query split has a speedup for total latency compared to PostgreSQL, which verifies the effectiveness of query split. And as shown in Figure 9, for most long-time queries in PostgreSQL (longer than 10 seconds), the speedup that we achieve is quite significant.

Second, query split is very close to the optimal result. Although there is still a gap between query split and optimal plan in total latency, the gap is mainly caused by query 21, 28 and 61, whereas the latency of other queries in query split is close to optimal plan. The result suggests that it is promising for query split to reach the optimal end-to-end latency by using better query splitting algorithms and execution order decisions.

Another implication of Figure 9 is that the accuracy of cost mo-del would affect the system performance. In Figure 9, we observe that in query 50, 51, 52, and 55, the latency of optimal plan is longer than query split. The reason for this counterintuitive phenomenon is that the cost model underestimates the cost of index scan, which leads the optimizer to select a suboptimal physical operator. As the number of configuration parameters that can be tuned in PostgreSQL is limited, it is impossible to tune a perfect cost model. The issue of cost model accuracy is beyond the scope of this paper, and we merely point it out as a potential direction for future research.

5.5. Memory Usage

In this subsection, we investigate into the out of memory issue that happened during our experiments. Although the total memory that PostgreSQL can use is 128 GB, the total capacity of buffer which is used to store temporary tables is limited to 512 MB in our experiments. As we can see in Table 2, whether out of memory (OOM) will occur depends heavily on the choice of query splitting strategy. There are two reasons why OOM arises for MinSubquery and EntityCenter:

  1. [leftmargin = 15pt]

  2. As we have discussed in Section 4.1, we cannot control the size of intermediate results in MinSubquery and EntityCenter, thus need to materialize a very large temporary table for some queries.

  3. We do materialization more often in MinSubquery. This is because each subquery in MinSubquery is only allowed to contain two relations, and more subqueries are needed to cover the original query.

Figure 10. Distribution of size of subquery result

To illustrate our points, we show the distribution of size of subquery result in Figure 10 and the average amount of subqueries induced by different query splitting algorithms in out-of-memory queries (query 23, 24, 37, 61) in Table 4. As shown in Figure 10, subqueries in MinSubquery and EntityCenter are more likely to larger result size (rows ¿ 2M). In Table 4, we can see that MinSubquery splits the original query into more subqueries than others.

Query splitting strategy Average number
MinSubquery 6.25
RelationshipCenter 2.5
EntityCenter 4.5
Table 4. Average amount of subqueries

There are three implications of these results: (a) we should try to control the size of subquery results. (b) we should control the amount of subqueries and release the memory allocated for subquery results as soon as they are no longer needed. (c) we should give a high priority to those subqueries that reference previous subqueries, so that we can release the memory previously allocated.

6. Deal with non-SPJ Query

In this section, we discuss how to extend query split to handle queries that contain other operations (e.g. outer join, semi join, or aggregation functions). We first give an overview of the extended query split in Section 6.1, then introduce how to get the result of non-SPJ operations in Section 6.2. Although we have not implemented the support for these kinds of queries, the basic underlying ideas are quite straightforward.

6.1. Overview

To deal with non-SPJ queries, the basic idea is to split them into non-SPJ operations that connect several SPJ subqueries and apply query split on each subquery.

The workflow of the extended algorithm is shown in Figure 11. If the query Q is a SPJ query, we simply run standard query split on it. Otherwise, we process Q as follows:

  1. [leftmargin = 15pt]

  2. We parse the query, and select a non-SPJ operator (denote as op) with maximal depth, such that all its inputs are SPJ queries.

  3. We obtain the result of op by the methods that we will describe in Section 6.2.

  4. We generate a new query Q’ from Q by regarding the result of op as a base relation.

  5. Repeat (1)-(3) on Q’ until it becomes a SPJ query.

Figure 11. The workflow of extended query split

6.2. Execute non-SPJ Operators

To obtain the result of a non-SPJ operator, we first construct the input subqueries and execute them via query split. Then, we deliver the subquery results to the non-SPJ operator and execute it.

As shown in Figure 12, to deliver the subquery results to the non-SPJ operator, we can choose either pipelining one tuple at a time or materializing the entire result. Note that this is different from standard query split, since it is not necessary to gather runtime statistics. Although the runtime statistics can be used for choosing a better physical operator, the benefit of doing this is often very limited.

Figure 12. Subqueries of a non-SPJ operator

7. Case Study

In this section, we choose two example queries (in Section 7.1 and 7.2) from JOB as case study to obtain some insights into query split. We demonstrate the detailed execution plan of query split on these queries by Figure 13 and 14, and explain what makes query split faster than PostgreSQL, then discuss our findings from case studies in Section 7.3 and Section 7.4.

Before analyzing the examples, we introduce the marks that will appear in the figures. In Figure 13 and 14, the percentage next to the execution node represents the ratio of the time spent by the physical operator to the total execution time of query split. The colored line represents the boundary of subqueries. The number above the relational algebra represents the estimated and actual size of intermediate results. As the estimation results at the subquery boundaries are precise, we just show the actual value.

7.1. The First Case

(a) The join graph of example query 1
(b) Execution plan of PostgreSQL
(c) Execution plan of query split
Figure 13. The detail of query execution for example 1

As shown in Figure 13, there is a big gap between the cardinality estimation errors in PostgreSQL and query split. We use q-error (Moerkotte et al., 2009), the factor by which the estimation differs from the true cardinality, to measure the quality of cardinality estimation. The average q-error in PostgreSQL is 246, while in query split it is only 5.

Another major difference in the execution plan between PostgreSQL and query split is about how to deal with the join with table ci. In PostgreSQL, the execution plan directly join ci with the current intermediate result by nest loop. However, it’s unwise because both left and right subtree are huge. As shown in Figure 13(b), this nest loop costs too much time. In contrast, the decision of query split is much better. As shown in Figure 13(c), execution plan in query split first decreases the size of table ci by joining with entity relations chn and rt. Note that although chn is huge as well, merge join can be applied without sorting and use much less time. Then, after the size of ci is reduced, it is joined with . This decision effectively improves a time-consuming join in PostgreSQL and speeds up the query execution (from 128s to 48s).

7.2. The Second Case

(a) The join graph of example query 2
(b) Execution plan of PostgreSQL
(c) Execution plan of query split
Figure 14. The detail of query execution for example 2

As shown in Figure 14, there is also a big gap between the cardinality estimation errors in PostgreSQL and query split. The average q-error in PostgreSQL is 11310, while in query split it is 37.

Figure 14(b) illustrates that the poor performance of PostgreSQL is due to using too much time to join table mi, where two huge inputs are involved. As shown in Figure 14(c), in query split, the optimizer postpones the join with mi, and instead decreases the size of inputs first. So when the join with mi happens in query split, the size of another input is relatively small, which makes the execution faster.

7.3. Our Findings

Now we can generalize two findings from case studies. First, the plan produced by query split is close to optimal plan. This is due to two reasons:

  • [leftmargin = 15pt]

  • The execution plan in each subquery is near-optimal. By gathering runtime statistics after the execution of each subquery, optimizer can get precise statistics of base relations in subqueries. Meanwhile, the size of each subquery is small. Under such conditions, cardinality estimation of the database optimizer is relatively accurate.

  • By considering both result size and execution time, we can make a decent decision on subquery execution order. We postpone the execution of long-time join and decrease its size by replacing original inputs with other subqueries’ results.

Second, we find that both queries are dominated by a very slow physical operator that consumes a massive piece of time (i.e. merge join and nest loop join), which takes more than half of the total execution time. We study this topic further in Section 7.4 to reveal remaining bottlenecks of query execution when the join order is good enough, and thus can help us further improve the query execution performance.

7.4. A Further Study on Physical Operator

The phenomenon that the execution time is dominated by a single physical operator is in fact quite common in JOB. To study this topic further, we classify queries in JOB according to the operator that takes the longest time during execution, as shown in Table 5.

physical operator query Ids
hash join 23, 24
merge join 28, 30, 61
index scan 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16, 17, 19, 20, 21, 22, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 42, 45, 46, 47, 48, 49, 53, 54, 55, 56, 57, 62, 63, 64, 65, 66, 67,68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91
sequential scan 14, 18, 25, 26, 27, 41, 43, 44, 50, 51, 52, 58, 59, 60
Table 5. The physical operators that dominate JOB queries

We find that most queries are dominated by an index scan and other queries are dominated by hash join, merge join, or sequential scan.444Note that our conclusion is derived from JOB, in which the number of queries is limited, and queries are human-write. The above reasons may cause our conclusion to be particular for JOB but not general for other benchmarks. In the following part, we will discuss each of these operators.

7.4.1. Hash Join and Merge Join

It is possible further improve hash join and merge join by using a new and faster physical operator, directmap join. In hash join, to fetch an inner tuple, we need two random memory accesses: one for probing the hash table, another for accessing the inner tuple. If we reduce the number of memory accesses to once, we can achieve a faster execution speed.

By combining probing and accessing data together, directmap join can fetch a matching inner tuple through one memory access. Like hash join, directmap join consists of two phases: build phase and probing phase. In build phase, we copy the data of inner relation into a two-dimension array called map, which can be treated as a combination of hash table and relation data. Due to the join attribute values in JOB being non-negative integer and having limited range, we simply use the original value as hash value (), and each row of map is used to store an inner tuple.

Figure 15. The effect of directmap on queries with join operator bottleneck

In probe phase, for each tuple from outer relation with join attribute value , we fetch the corresponding row and return the inner tuple if it is not empty. In this way, when probing and accessing an inner tuple, we only use one random memory access. We will explain the detailed implementation of directmap join in Appendix F.

We use directmap join in those join operator constraint queries (query 23, 24, 28, 30, 61), and compare their execution time with previous results. The end-to-end latencies are shown in Figure 15. Query 28, 30, 61 have more than 50% improvement in end-to-end latency, and query 23, 24 have nearly 25% improvement. The result proves that directmap join can greatly improve the total execution time for queries with join operator as bottleneck.

7.4.2. Index Scan and Sequential Scan

Unlike hash join and merge join, the queries dominated by index scan and sequential scan are hard to improve due to three reasons:

  1. [leftmargin = 15pt]

  2. The use of scan operators is inevitable.

  3. As far as we know, apart from index scan and sequential scan, there are no other comparable scan operators. In other words, if the optimizer has chosen the better scan operator between index scan and sequential scan, then there is little room for further improvement.

  4. According to the cost model in PostgreSQL (Documentation, 2021), we substitute the true cardinality into the cost model and find the scan operator decision made by the optimizer is indeed correct in query split.

From above three points, we conclude that there is no room to further improve index scan and sequential scan in query split. Hence, under the dynamic query optimization framework, for most queries dominated by a scan operator, query split is already a nearly optimal solution. Although there might exist some other strategies to further improve scan operators, such strategy are beyond the scope of our paper.

8. Related Work

Our work is relevant to three research directions: (a) the literatures that studies the intrinsic difficulty of accurate cardinality estimation; (b) adaptive query processing; (c) researches that aim to improve accuracy of cardinality estimation. We state existing works in these directions respectively in Section 8.1, 8.2, and 8.3.

8.1. Difficulty of Cardinality Estimation

On theoretical front, Ioannidis and Christodoulakis (Ioannidis and Christodoulakis, 1991) proved that cardinality estimation errors would grow exponentially with join size. They study the natural join and assume there is an error on the estimated distribution of join attributes in each relation.

Leis et al. (Leis et al., 2018) pay attention to practice and use experiments to show that cardinality estimators in commercial database produce large errors in real workload. They introduced a new benchmark called “Join Order Benchmark”, which is simple but challenging. Their experiments suggest that although cardinality estimator is accurate for one relation queries, no existing techniques can accurately estimate cardinality with high correlated join predicates.

Their research shows the necessity for new query optimization frameworks and the intrinsic difficulty of accurate cardinality estimation, which inspires our research on dynamic query optimization.

8.2. Adaptive Query Processing

According to reference (Babu and Bizarro, 2005), there are three families in adaptive query processing: plan-based system (reoptimization), routing-bas-ed system and continuous-query-based system.

8.2.1. Reoptimization

The most relevant work in the literature is reoptimization. Reoptimization monitors the execution of the current plan and re-optimizes whenever the actual condition differs significantly from the estimations made by the optimizer.

Reopt (Kabra and DeWitt, 1998) and Pop (Markl et al., 2004) are the two most representative works on reoptimization. Reopt adds statistic collection operators to the execution plan and collects statistics after a segment has been executed. When finding that the collected statistic deviates too much from estimation, the database triggers reoptimization and re-optim-izes the unexecuted part by the collected statistic.

The idea of Pop is the same as Reopt, but makes improvements in several aspects. First, Pop can trigger reoptimization in more join nodes, while Reopt can only trigger reoptimization at the node where query results are not pipelined. Second, Pop use more complex analysis to decide whether to re-optimize.

Conquest (Ng et al., 1999) extends the reoptimization to a parallel execution environment. To decrease the time of collecting the intermediate statistics, sampling reoptimization (Wu et al., 2016) throws the plan to sampling-based estimator. In recent work, Cuttlefish (Kaftan et al., 2018)

uses reinforcement learning techniques to explore possible physical operators during query execution and exploits the fastest ones. However, Cuttlefish can only modify the physical operator, but is unable to change the join order.

Recently, Perron et al. (Perron et al., 2019) simulates reoptimization in Postgre-SQL. They first observe the error by the “explain” command and materialize the intermediate result that deviates too much from estimation as a temporary table. Then they rewrite the query by replacing the materialized part with the temporary table, and finally they call the optimizer again. Their result shows that reoptimization in PostgreSQL can sharply improve the execution time.

Compared to dynamic query optimization, reoptimization can only be triggered when a deviation is found, which is a reactive approach. The new dynamic query optimization query split we propose in the paper has two differences with reoptimization. First, query split proactively re-optimize unexecuted subqueries after executing a subquery, while reoptimization re-optimize the unexecuted part of the original query in a reactive way. Second, the number of times for stopping pipeline and collecting runtime statistics in reoptimization is unknown. On the contrary, query split is robust with respect to the degree of plan modification, because we decide when to call optimizer again in advance.

8.2.2. Routing-based system

Routing-based systems eliminate the traditional optimizer and query plans. They process queries by routing tuples through a pool of operators. The idea of the routing-based system can be traced back to INGRES (Wong and Youssefi, 1976). In INGRES, each tuple could be executed in a different order. The routing-based system eliminates the concept of execution plan by routing tuples one by one. Eddies (Avnur and Hellerstein, 2000) adds a new operator called ripple join, and Eddies can change the join order in ripple join.

Both dynamic query optimization and routing-based system b-reak the unidirectional pipeline between optimizer and executor. However, compared to dynamic query optimization, routing-based systems totally abandon optimizer, making routing algorithms hi-ghly dependent on greedy algorithm and therefore unsuitable for complex queries (Babu and Bizarro, 2005).

8.2.3. Continuous-query-based System

Continuous-Query-based, or CQ-based, systems are used for queries that will run many times or a very long time, which is prevalent in the data stream systems.

Although CQ-based systems also support variable execution p-lan during execution, however, there is a huge gap between CQ-base systems and our work. CQ-base systems pay attention to the runtime change of stream characteristics and system conditions, rather than cardinality estimation error for a given query.

8.2.4. Others

Dynamic query plan (Babu et al., 2005; Cole and Graefe, 1994; Graefe and Ward, 1989) and parametric query optimization (Ioannidis et al., 1997) are proposed to change the query plan without calling the optimizer again. They give a set of possible optimal plans to the executor instead of giving one single optimal plan and during execution, the plans can switch to each other.

Similar to other adaptive query processing techniques, these works also pay attention to a flexible execution plan. But compared to our work, query optimization and execution in dynamic query plan and parametric query optimization is not interleaved, which leads to a great overhead to maintain multiple execution plans.

8.3. Cardinality Estimation

Cardinality estimation techniques (Hasan et al., 2020) are also relevant. The researches in cardinality estimation techniques aim to improve cardinality estimation. There are two main research areas: data-driven cardinality estimator (Yang et al., 2019; Tzoumas et al., 2011; Gunopulos et al., 2005; Hilprecht et al., 2019; Dutt et al., 2019; Leis et al., 2017; Yang et al., 2020; Zhao et al., 2018; Chen and Yi, 2017; Kipf et al., 2019), query-driven cardinality estimator (Sun and Li, 2019; Kipf et al., 2018; Heimel et al., 2015; Park et al., 2020; Marcus et al., 2019; Dutt et al., 2020; Ortiz et al., 2018)

. The data-driven cardinality estimator approximates the data distribution of a table by mapping each tuple to its probability of occurrence in the table. The query-driven cardinality estimator uses some models to learn the mapping between queries and cardinalities.

We emphasize that there is no conflict between dynamic query optimization and advanced cardinality estimation techniques. On the contrary, query split benefits from the improvement of cardinality estimation on small join. Making an optimal plan in each subquery can significantly enhance overall performance.

9. Conclusion

The intrinsic difficulty of cardinality estimation makes traditional query optimization hard to make optimal plan, however, dynamic query optimization can help the database optimizer avoid these difficulties. In this paper, we propose query split, a proactive strategy of dynamic query optimization. Using runtime statistics and proactively interleaving query optimization and execution, query split reaches near-optimal latency in Join Order Benchmark. Our results suggest that dynamic query optimization can effectively improve the speed of query execution. Furthermore, the case study shows that under the dynamic query optimization framework, most queries are dominated by a physical operator. The result implies that the execution time on the query can be hugely improved by improving the bottleneck operator itself and there is little room for further improvement under the dynamic query optimization framework.


  • R. Avnur and J. Hellerstein (2000) Eddies: continuously adaptive query processing. Vol. 29, pp. . External Links: Document Cited by: §2.3, §8.2.2.
  • S. Babu, P. Bizarro, and D. DeWitt (2005) Proactive re-optimization. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, SIGMOD ’05, New York, NY, USA, pp. 107–118. External Links: ISBN 1595930604, Link, Document Cited by: §1, §2.3, §8.2.4.
  • S. Babu and P. Bizarro (2005) Adaptive query processing in the looking glass. In Proceedings of the Second Biennial Conference on Innovative Data Systems Research (CIDR), Jan. 2005, Cited by: §1, §2.3, §8.2.2, §8.2.
  • P. P. Chen (1976) The entity-relationship model—toward a unified view of data. ACM transactions on database systems (TODS) 1 (1), pp. 9–36. Cited by: §4.1.2.
  • Y. Chen and K. Yi (2017) Two-level sampling for join size estimation. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, S. Salihoglu, W. Zhou, R. Chirkova, J. Yang, and D. Suciu (Eds.), pp. 759–774. External Links: Link, Document Cited by: §8.3.
  • R. L. Cole and G. Graefe (1994) Optimization of dynamic query evaluation plans. In Proceedings of the 1994 ACM SIGMOD international conference on Management of data, pp. 150–160. Cited by: §8.2.4.
  • P. Documentation (2021) Additional supplied modules. Cited by: item 3.
  • A. Dutt, C. Wang, V. Narasayya, and S. Chaudhuri (2020) Efficiently approximating selectivity functions using low overhead regression models. Proceedings of the VLDB Endowment 13 (12), pp. 2215–2228. Cited by: §8.3.
  • A. Dutt, C. Wang, A. Nazi, S. Kandula, V. Narasayya, and S. Chaudhuri (2019) Selectivity estimation for range predicates using lightweight models. Proc. VLDB Endow. 12 (9), pp. 1044–1057. External Links: ISSN 2150-8097, Link, Document Cited by: §1, §8.3.
  • M. T. Goodrich and R. Tamassia (2015) Algorithm design and applications. Wiley Hoboken. Cited by: Appendix F.
  • G. Graefe and K. Ward (1989) Dynamic query evaluation plans. In Proceedings of the 1989 ACM SIGMOD international conference on Management of data, pp. 358–366. Cited by: §8.2.4.
  • D. Gunopulos, G. Kollios, J. Tsotras, and C. Domeniconi (2005) Selectivity estimators for multidimensional range queries over real attributes. The VLDB Journal 14 (2), pp. 137–154. External Links: ISSN 1066-8888, Link, Document Cited by: §1, §8.3.
  • S. Hasan, S. Thirumuruganathan, J. Augustine, N. Koudas, and G. Das (2020) Deep learning models for selectivity estimation of multi-attribute queries. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 1035–1050. Cited by: §8.3.
  • M. Heimel, M. Kiefer, and V. Markl (2015) Self-tuning, gpu-accelerated kernel density models for multidimensional selectivity estimation. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD ’15, New York, NY, USA, pp. 1477–1492. External Links: ISBN 9781450327589, Link, Document Cited by: §1, §8.3.
  • B. Hilprecht, A. Schmidt, M. Kulessa, A. Molina, K. Kersting, and C. Binnig (2019) DeepDB: learn from data, not from queries!. CoRR abs/1909.00607. External Links: Link, 1909.00607 Cited by: §8.3.
  • Y. E. Ioannidis, R. T. Ng, K. Shim, and T. K. Sellis (1997) Parametric query optimization. The VLDB Journal 6 (2), pp. 132–151. Cited by: §2.3, §8.2.4.
  • Y. E. Ioannidis and S. Christodoulakis (1991) On the propagation of errors in the size of join results. In Acm Sigmod International Conference on Management of Data, pp. 268–277. Cited by: §2.1, §5.3, §8.1.
  • Z. Junyi and G. Yihan (2022) Break up the pipeline structure to reach a nearly optimal end-to-end latency. arXiv preprint. Cited by: §1.
  • N. Kabra and D. J. DeWitt (1998) Efficient mid-query re-optimization of sub-optimal query execution plans. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, SIGMOD ’98, New York, NY, USA, pp. 106–117. External Links: ISBN 0897919955, Link, Document Cited by: §1, §2.1, §2.2, §5.2, §8.2.1.
  • T. Kaftan, M. Balazinska, A. Cheung, and J. Gehrke (2018) Cuttlefish: a lightweight primitive for adaptive query processing. arXiv preprint arXiv:1802.09180. Cited by: §8.2.1.
  • A. Kipf, T. Kipf, B. Radke, V. Leis, P. A. Boncz, and A. Kemper (2018) Learned cardinalities: estimating correlated joins with deep learning. CoRR abs/1809.00677. External Links: Link, 1809.00677 Cited by: §1, §8.3.
  • A. Kipf, D. Vorona, J. Müller, T. Kipf, B. Radke, V. Leis, P. Boncz, T. Neumann, and A. Kemper (2019) Estimating cardinalities with deep sketches. In Proceedings of the 2019 International Conference on Management of Data, pp. 1937–1940. Cited by: §8.3.
  • V. Leis, B. Radke, A. Gubichev, A. Kemper, and T. Neumann (2017) Cardinality estimation done right: index-based join sampling. In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings, External Links: Link Cited by: §8.3.
  • V. Leis, B. Radke, A. Gubichev, A. Mirchev, P. Boncz, A. Kemper, and T. Neumann (2018) Query optimization through the looking glass, and what we found running the join order benchmark. The VLDB Journal 27 (5), pp. 643–668. External Links: ISSN 1066-8888, Link, Document Cited by: §1, §2.1, §2.2, §3.1, §5.1, §8.1.
  • R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul (2019) Neo. Proceedings of the VLDB Endowment 12 (11), pp. 1705–1718. External Links: ISSN 2150-8097, Link, Document Cited by: §1, §8.3.
  • V. Markl, V. Raman, D. Simmen, G. Lohman, H. Pirahesh, and M. Cilimdzic (2004) Robust query processing through progressive optimization. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD ’04, New York, NY, USA, pp. 659–670. External Links: ISBN 1581138598, Link, Document Cited by: §8.2.1.
  • G. Moerkotte, T. Neumann, and G. Steidl (2009) Preventing bad plans by bounding the impact of cardinality estimation errors. Proceedings of the VLDB Endowment 2 (1), pp. 982–993. Cited by: §7.1.
  • MySQL (2020) MySQL 8.0 reference manual. Cited by: §2.1.
  • K. W. Ng, Z. Wang, R. R. Muntz, and S. Nittel (1999) Dynamic query re-optimization. In Proceedings. Eleventh International Conference on Scientific and Statistical Database Management, pp. 264–273. Cited by: §8.2.1.
  • J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi (2018) Learning state representations for query optimization with deep reinforcement learning. In

    Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning

    pp. 1–4. Cited by: §8.3.
  • Y. Park, S. Zhong, and B. Mozafari (2020) QuickSel: quick selectivity learning with mixture models. External Links: 1812.10568 Cited by: §1, §8.3.
  • M. Perron, Z. Shang, T. Kraska, and M. Stonebraker (2019) How i learned to stop worrying and love re-optimization. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), Vol. , pp. 1758–1761. External Links: Document Cited by: §1, §1, §2.2, §5.2, §8.2.1.
  • A. Silberschatz, H. F. Korth, S. Sudarshan, et al. (1997) Database system concepts. Vol. 4, McGraw-Hill New York. Cited by: §2.1.
  • M. Stillger, G. M. Lohman, V. Markl, and M. Kandil (2001) LEO - db2’s learning optimizer. In Proceedings of the 27th International Conference on Very Large Data Bases, VLDB ’01, San Francisco, CA, USA, pp. 19–28. External Links: ISBN 1558608044 Cited by: §1.
  • J. Sun and G. Li (2019) An end-to-end learning-based cost estimator. Proc. VLDB Endow. 13 (3), pp. 307–319. External Links: Link, Document Cited by: §1, §8.3.
  • K. Tzoumas, A. Deshpande, and C. S. Jensen (2011) Lightweight graphical models for selectivity estimation without independence assumptions. Pvldb 4, pp. 2011. Cited by: §8.3.
  • E. Wong and K. Youssefi (1976) Decomposition—a strategy for query processing. ACM Transactions on Database Systems (TODS) 1 (3), pp. 223–241. Cited by: §8.2.2.
  • W. Wu, J. F. Naughton, and H. Singh (2016) Sampling-based query re-optimization. In Proceedings of the 2016 International Conference on Management of Data, pp. 1721–1736. Cited by: §8.2.1.
  • Z. Yang, A. Kamsetty, S. Luan, E. Liang, Y. Duan, X. Chen, and I. Stoica (2020) NeuroCard: one cardinality estimator for all tables. External Links: 2006.08109 Cited by: §8.3.
  • Z. Yang, E. Liang, A. Kamsetty, C. Wu, Y. Duan, P. Chen, P. Abbeel, J. M. Hellerstein, S. Krishnan, and I. Stoica (2019) Deep unsupervised cardinality estimation. Proc. VLDB Endow. 13 (3), pp. 279–292. External Links: Link, Document Cited by: §8.3.
  • Z. Zhao, R. Christensen, F. Li, X. Hu, and K. Yi (2018) Random sampling over joins revisited. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, G. Das, C. M. Jermaine, and P. A. Bernstein (Eds.), pp. 1525–1539. External Links: Link, Document Cited by: §8.3.

Appendix A Equivalence Rule

This appendix shows the concrete formulas that can be used to obtain the normal form of SPJ query. Because all selection-projection-join queries are relational algebra expressions that consist of only selection, projection and join, we can transform them to a normal form by equivalence rule:

  • [leftmargin = 15pt]

  • For each join expression, we can rewrite it as selection after Cartesian product:

  • We can change the order between projection and Cartesian product:

    where and are the set of attributes.

  • We can change the order between selection and Cartesian product:

    where and are the set of predicates.

  • When selection is executed after projection, we can change their orders:

    where is the set of attributes and is the set of predicates.

By above four transformations, we move all selections after all Cartesian products, and move projections after all selections, which is the normal form of SPJ query.

Appendix B Proof of Theorem 1

This appendix shows the proof of Theorem 1.

Theorem 0 ().

Let q(R, S, P) be an SP query, Q be a set of subqueries of q. If Qq, then the output of the replacement reconstruction algorithm is equal to the result of q.

Proof 0 ().

To begin with, let us introduce several notations:

  1. [leftmargin = 15pt]

  2. For a set of subqueries , we denote , .

  3. For a set of relations , we denote .

  4. For a SP query and a set of subqueries Q of , we denote the result of as , and the output of the replacement reconstruction algorithm as .

Notice that in both and , the projection is performed at last over the same projection attribute set P. Thus if the results before projection are same, the final results are same as well. Therefore, we can simplify notations of (3): For a SP query and the set of subqueries Q of , we denote the result of as , and the output of the replacement reconstruction algorithm as . Under these notations, we can rewrite the theorem as: Given a SP query , and a set of subqueries Q of , such that . Then we have .
Without loss of generality, we assume that the names of all attributes in R are unique. Under such assumption, we do not need to consider the rename operation when modifying subqueries, and for simplicity we assume that the rename step is skipped.
Now, we start to prove the rewritten theorem by induction on .
First, We prove the statement holds when , in which case . Apparently the only way that is . So the statement clearly holds for .
Now, assume that the statement holds when . We consider the case of , .
Without loss of generality, we denote the first executed subquery as and discuss two cases: (1) and (2) .
Case 1: We first execute and materialize its result as relation . Then, because , according to the algorithm, we have to add to the subquery result set L in Loop(modify) phase. After that, we remove from Q and have a new subquery set for the next iteration.
We construct a SP query , where and . Apparently, and as , by induction hypothesis, .
According to Merge phase, final result is the Cartesian product on the elements in L, so we have:

Case 2: We denote the set of subqueries that need to be modified after executing as .
After we execute and materialize its result as relation , we modify each and keep in Loop(modify) phase. We denote these new-formed subqueries as and , . These new-formed subqueries form a new set .
Now, Q becomes a new subquery set . Because at this point, when reconstruction finishes, we have .
We construct a new query , where and . We will prove that and , hence finishes the proof. To prove , by induction hypothesis, we only need to show that and .

  • [leftmargin = 15pt]

  • Apparently, . And because , logical implies , so .

  • Notice that , , and , so .

By using the induction hypothesis, we get .
At last, we need to prove :

Since is the execution result of , , so we have:

Thus, and the statement holds for .

Appendix C PostgreSQL Configuration

This appendix shows the configuration parameters we use in PostgreSQL. We set the memory limit per operator work_mem to 2 GB, the shared memory buffer shared_buffers to 512 MB and the size of the operating system’s buffer cache used by PostgreSQL effective_cache_size to 96 GB.

In addition, we modified the parameters of the cost model in order to fit our experimental settings. There are five configuration parameters for cost model (seq_page_cost, random_page_cost, cpu_tuple_cost, cpu_index_cost and cpu_operator_cost

), and we determine their value by ordinary least squares. We first execute SELECT queries that simply scan a relation and we categorize them into two types according to the scan operator they use:

sequential scan query:

  1. select * from movie_info_idx;

  2. select * from movie_keyword;

  3. select * from movie_info;

  4. select * from movie_companies;

index scan query:

  1. select * from movie_info_idx from A where movie_id ¡ 1000;

  2. select * from movie_keyword where movie_id ¡ 1000;

  3. select * from movie_info where movie_id ¡ 1000;

  4. select * from movie_companies where movie_id ¡ 1000;

After query execution, we obtain their latency and the true cardinalities. As we know the true cardinality, the cost model of scan operator becomes the linear combination of the five cost model parameters (formulas can be found in the source code of PostgreSQL):

(1) For sequential scan,

In above formula, the true cardinality contains and .

(2) For index scan,

In above formula, the true cardinality contains , , and .

We let the estimated cost equal to the latency, and use ordinary least squares method to calculate the approximation value of five parameters. By this approach, we set seq_page_cost to 3, random_page_cost to 3, cpu_tuple_cost to 0.07, cpu_index_cost to 0.07, cpu_operator_cost to 0.02 on our computer.

Appendix D The detail of optimal plan

This appendix describes the details of optimal plan. In order to know the least execution time that PostgreSQL can achieve, we design an optimizer that can make optimal plans. We achieve this by building a simple history-recording optimizer and use the same query as input repeatedly. During whole iteration, we record the true cardinality of executed intermediate relations, and when we meet the intermediate relation that appeared before, we replace the cardinality estimation with its true cardinality. So, as more and more true cardinalities have been known, the plan made by the optimizer is closer to the optimal plan.

The implementation of history-recording optimizer is based on a history list. Each member in the history list contains a bitmap as the token of an intermediate relation and its true cardinality. The length of bitmap represents how many base relations in the query, each bit represents a base relation, and we set the bit as 1 if the corresponding base relation is involved in the intermediate relation. For example, given a query , the bitmap of intermediate relation as , as and as .

However, as some intermediate results may not appear in the execution plan during whole iterations, we have to encourage the optimizer to try the join orders that it haven’t explored. This is achieved by artificially decreasing the cost of new join orders. More specifically, as the cost is relevant to the relation size, we decrease the estimation size of new intermediate relation by multiplying a factor (we use 0.1 for most queries, 0.5 for some huge queries).

Due to history-recording and encouragement for new join orders, the plan changes during the iteration, and when the plan stops changing, this plan is the optimal plan.

Appendix E Re-PostgreSQL

This appendix describes the implementation of Re-PostgreSQL. Reoptimization consists of three key points: when to collect statistics, what kinds of statistics to be collected, and how to modify the execution plan.

  • [leftmargin = 15pt]

  • The timing of collecting statistics is important because if the collection misses the deviant part, the reoptimization will not be triggered. Hence, the timing of statistic collection directly affects whether reoptimization will happen.

  • The kinds of statistics to be collected and the overhead of reoptimization is a trade-off. The more kinds of statistic collected, the better execution plan will be made. However, the time used to collect statistics increase as well, which increase the total execution time.

  • When deviation appears, reoptimization need to consider how to modify the execution plan with new statistics. Discarding the whole execution and generating a new execution plan is a simple approach, but cause significant overhead. Another more feasible way is to suspend the process of tuples, reserve the executed part and modify the part of the original query that has not started executing yet.

Now, we describe our implementation of reoptimization in PostgreSQL from these three parts. For the timing to collect statistics, We choose to do the collection at each physical operator. For what kinds of statistics to be collected, we just collect the rows, number of distinct values, and most common value and its frequency if most common value exists, because they are basic statistics in PostgreSQL, and is enough to generate a better execution plan. At last, as soon as we collect the statistics from a materialized result, we call the optimizer again to make a new execution plan for the remainder of the original query, and we output the final result when the whole query is executed.

Appendix F Detail of Directmap Join

This appendix describes the implementation of directmap join.

In the build phase, we copy inner relation into a two-dimension-al array called map, which can be treated as a special hash table with the data of inner relation. The hash function of map is . Note that this simple hash function only works when the join attribute value is a non-negative integer, which is satisfied in JOB but not in all benchmarks. And each row of map is used to store a data tuple, and by given hash function, the data tuple is stored at the row whose row number equals the value of the data tuple’s join attribute.

Except for data slot, each row of the map calso contains a header. Header consists of three components: valid, previous, and next. va-lid represents if the data slot has been occupied by an inner tuple. However, it is common that several tuples have the same value on the join attribute, which causes a conflict. We use open addressing (Goodrich and Tamassia, 2015) to solve this problem. And we use previous and next to maintain a list where all members have the same join attribute value, in order to avoid unnecessary search in the map. When meet a conflict, we store the new tuple in an empty row and add its position to the tail of corresponding list. Moreover, if there is no more empty row for the new-coming tuple, we simply double the size of map.

We fetch inner tuple from scan operator one by one to create map. When an inner table tuple comes, we first calculate the row number that it will store in according to its join attribute value. Then we check valid to ensure the target row is empty and there would be three cases:

Case 1: If valid is 0, we store in the target row and set previous and next to .

Case 2 (Figure 16): If valid is 1, which means the row is occupied by a tuple . Meanwhile, is a “host tuple” whose join attribute value equals to its row number. Then, we place tuple to an empty row and call tuple as “guest tuple” whose join attribute value does not equal to its row number. And we add its position to the tail of list where the join attribute value of tuples equals to .

Figure 16. The illustration of condition 2

Case 3 (Figure 17): If valid is 1 and is a “guest tuple”, whose join attribute value does not equal to its row number. We first move to a new empty row, then fit in this row as “host tuple” and set the previous and next to . Then, we refresh the position of in the list where all members have the same join attribute value with .

Figure 17. The illustration of condition 3

After introducing how to build map, we describe how to return a join result in probe phase. As shown in Figure 18, we first fetch an outer tuple, and then we get the join attribute value of outer tuple as row number to fetch the corresponding row in map. After we fetch the corresponding row from map, we check its valid word. If empty, we fetch the next outer tuple and repeat above steps. Otherwise, we check next to determine whether there is a list. If , which means there is only one tuple has given join attribute value, we return the join result and fetch next outer tuple. If not , we not only return the join result, but also fetch the successors that have the same join attribute value by traversing the list.

Figure 18. How directmap join works

Another problem is how to combine directmap join with PostgreSQL optimizer. By our design, whether the optimizer chooses the directmap join operator depends on a heuristic judgment and its cost model.

A heuristic judgment is used to control the memory usage, as the size of map is no less than the maximum of join attribute, which means directmap join sacrifices memory to earn speed. First, we ensure that the inner relation must be a base relation or a subquery result to avoid too much confliction. Then, we check the ratio , where is the estimated number of tuples after applying the predicate on inner relation, and is the size of inner relation. As map size is no less than the maximum of appeared join attribute values, we use to approximate the map size. And equals the number of rows that we actually use in map. Hence, the ratio can roughly estimate the ratio of how much memory that we actually use for storage and allocate for directmap join, which reflects the memory utilization rate of directmap join. If the ratio is less than a threshold, we think it’s memory-wasteful to create a map, so we abandon the consideration of using directmap join.

Second, we set the cost model of directmap join by imitating the cost model of hash join. As directmap join reduced twice memory access to once, we half the fetch cost by CPU. To show our change in fetch cost, we divide the cost of hash join by two as the cost of directmap join.