An industry-grade DBMS is a massive software system. As the requirements of the individual end-users and their workloads heavily differ, these systems are equipped with tons of options and configuration knobs. Finding the ”right” setup for the current situation has been an extremely challenging task ever since and even lead to the creation of an entire profession around that task: the database administrator or DBA. These administrators constantly monitor the performance of the system and tune the parameters to fit to the current workload as good as possible. To a certain degree, they rely on so called design advisory tools, that provide recommendations on what to do based on statistics. However, as statistics have limits , a good administrator will always rely at least equally on experience and intuition to come to the right decision.
Both experience and intuition11, 9]. In general, they seem to perform great in situations with vast search spaces and problems, that are too complex to grasp by traditional machine learning approaches [11, 1]. This raises the question whether it is possible to apply deep learning techniques to the problem of DBMS administration as well. Due to the vast amount of configuration parameters and workload differences, the search space is huge as well and extremely hard to overview — the perfect candidate for deep learning.
1.1 Deep Reinforcement Learning
Especially interesting in the context of administration is deep reinforcement learning 
. In contrast to traditional supervised learning
, where a neural network is trained on a set of given inputs and expected outputs, in reinforcement learning, the training process does not require any expected outputs. Instead, the training is completely driven by so calledrewards, that tell the learner whether a taken action lead to a positive or a negative result on the input. Depending on the outcome, the neural network is encouraged or discouraged to consider the action on this input in the future. Obviously, this is directly equivalent to an administrator, who changes a parameter of the system setup and is either rewarded or punished with an improved or worsened performance. A smart DBA will base future decisions on the made experiences.
In the same manner, we can apply deep reinforcement learning to train a neural network in taking over the administration process. Essentially, we have to define a so called problem environment consisting of the four following components to perform the learning:
The input to the neural network. This is typically the current workload in form of query characteristics, for which the system should be optimised as well as the current state of the configuration.
The set of actions, that can be taken. An action could be to create a secondary index on a certain column or to change the size of a database buffer. Such an action is a transition from a current system configuration to a new system configuration.
The reward function, that rates the impact of a taken action. For example, the runtime of the new configuration after the action has been taken could be compared with the best configuration seen so far. The higher the improvement, the higher is the returned positive reward. If performance degrades, a negative reward is returned.
The hyper parameters to steer the learning process. This includes properties of the neural network (e.g. number of hidden layers, number of nodes per layer) as well as properties of the learning process like the number of iterations.
With the right instantiation of these categories, we are generally able to train a neural network for a given optimisation goal (e.g. minimal runtime) and a given workload (e.g. a set of queries).
1.2 Prediction and Training
With a high-level understanding of the involved components, let us now first go through the workflow of a general learner and see how it predicts and trains. Assuming that all required hyper parameters are configured, which we will discuss in detail in Section 2.4, step of the learning process essentially consists of the following sequence:
Using the neural network, we predict the next action to perform. In the beginning of the training phase, a prediction will be mostly arbitrary and of low quality. Over time, the training will (hopefully) improve the prediction quality.
We apply the predicted action to the current configuration , resulting in a new configuration .
We compute the reward of the taken action using the reward function on the new configuration. This transition from one configuration to a new configuration can be summarised in the quadruple
We train the neural network using and restart in (1). A positive reward will enforce the neural network to predict on future configurations, which are similar to .
For a better understanding, let us see how this general workflow maps to the concrete example of playing a famous classic Atari game: Breakout . The goal of Breakout is to destroy a set of blocks, located at the top of the screen, by repeatedly bouncing back a ball with a paddle, that can be moved horizontally. As soon as all blocks are destroyed, the game is won. If the paddle misses the ball, such that the ball drops below the screen, the game is lost. Figure 1 shows Breakout on the Atari 2600.
To perform reinforcement learning on Breakout, the current state of the game board (i.e. the locations of blocks, ball, and paddle) is fed into the neural network as input. The set of actions is composed of the possible movements of the paddle: the player can either move a step to the left, move a step to the right, or stay. A positive reward is generated, if the paddle hits the ball. Table 1 summarises the setup again.
|Category||Atari Breakout ||NoDBA [this work]|
|Input||Current state of the game board.||Workload and current index configuration.|
|Set of Actions||Move paddle left, move paddle right, stay||Create an index on a particular column.|
|Reward||Positive, if the paddle hits the ball.||Positive, if an index configuration improves over previous state.|
The actual learning happens by applying the previously described steps. Early in the learning phase, the predicted paddle movements seem random and meaningless. However, as soon as the paddle manages to bounce back the ball, a positive reward is generated and fed back into the network for training. Over time, the system will learn to position the paddle horizontally as close as possible to the falling ball.
Obviously, the core principle of deep reinforcement learning is fairly simple and thanks to the general concepts of input, action set, and reward function, can be applied to a variety of problems. In this paper, we will showcase the quality of deep reinforcement learning in the database world using index selection as an example — a task that has a crucial impact on query execution times.
1.3 Automatic Index Selection
Given a workload, the task of index selection is to decide on which attributes to create secondary indexes, such that the processing of the workload benefits the most. Typically, an upper limit on the number of indexes to create is given due to space constraints and index maintenance costs, so simply indexing all columns is not an option. Of course, the goal is to come up with a proper index selection in an automatic way without the need of a human administrator. Let us precisely define the problem: assuming our database schema consists of columns (which can originate from multiple tables) and is the maximum amount of indexes we are allowed to create with , then we want to find a subset of of maximum size that minimises the runtime on a workload .
With the problem definition in mind, let us see step by step how we can map the problem of index selection to the general deep reinforcement learner described previously in Section 1.2. We will first define the input to the neural network in Section 2.1, the set of possible actions in Section 2.2, and the reward function in Section 2.3. Then, we will discuss the hyper parameters of the learner in Section 2.4, that we have to set up.
2.1 Neural Network Input
We start by defining the input to our neural network. As mentioned in Section 1.1, this input is typically a combination of the encoding of (1) the workload and (2) the current configuration. Thus, we will separate the input into two parts and , that are both fed into the neural network.
In , we encode the characteristics of the workload in form of a matrix of size , where is the number of queries in the workload and is the number of columns in the database schema. An entry () of the matrix describes for a query and a column the selectivity :
If a query selects on a particular column , then returns the number of selected records in relation to the total number of records — the smaller the result, the higher111A small number of returned records is called a high selectivity. the selectivity. If a query does not select on a particular column at all, then . This denotes the lowest possible selectivity and will encourage the system not to consider building an index on this column. These selectivities are computed by running every query of the workload on the database once upfront.
As already mentioned, we additionally have to encode the current index configuration in , which describes on which columns indexes exist. Thus, is simply the following bitlist of size
with , if there exists an index on or no query in the workload uses , and otherwise.
2.2 Set of Actions
Next, we have to define the set of possible actions , that can be carried out. For the problem of index selection, we have to consider only a single action: the creation of an index on a specific column222We do not consider multi-column indexes in this use-case. . Thus,
We have to mention here, that we are applying episodic reinforcement learning, like it is typically performed for learning games. This means that multiple steps as described in Section 1.2 form a so called episode, in which we transition from an initial configuration to a final configuration. For games, the initial configuration typically describes the starting setup of the game board and the final configuration a state of winning or losing. In our problem of index selection, the initial configuration resembles having no indexes at all, whereas a final configuration is a configuration of the maximum number of indexes . Consequently, we form a new episode by, starting with the initial configuration, incrementally adding indexes one by one until the maximum amount is reached. As soon as is reached, we conclude the episode, drop all indexes, and start the next episode with the initial configuration again. Thus, an action drop_index_on() is not required in this setup.
Defining the reward function is a delicate task and has a high impact on the learning quality. In our example, the overall goal is to optimise the query response time. First, we have to define a cost function. If is the set of indexed columns, then
returns the cost of the entire workload for this set of indexes. Of course, this cost can be either estimated by the DBMS to achieve a high training speed (e.g. by using theEXPLAIN command) or it can originate from actual query runtimes to achieve higher precision.
To have a baseline to compare with, we first compute the cost of executing the workload without any indexes as . Then, for an index configuration containing indexes with , the function
returns the reward of that configuration . Obviously, the lower the , the higher the reward. If a configuration does not lower the cost at all or even increases the cost, the reward is .
2.4 Hyper Parameters
To run the described reinforcement learner, a number of hyper parameters must be set before execution. First, we have to set the upper limit on the amount of indexes, denoted as , usually dictated by a space constraint on the available memory. We chose for the upcoming evaluation. The remaining parameters are directly connected to the reinforcement learner and are largely chosen in a process of trial and error: we set the number of hidden layers to
, the number of neurons per hidden layer to
, and the activation function for the hidden layers and the output layer toRELU and SOFTMAX respectively. As agent, which resembles the actual learning algorithm, we use CEM.
2.5 Design Advisory Tools
As mentioned, there is a classical helper tool of the DBA with a similar purpose — the design advisory software, that is shipped with an industry-grade DBMS. These tools are used by passing a workload file containing a set of SQL queries. To come up with index recommendations, they introduce so called virtual indexes. A virtual index is just the description of a hypothetical index in form of meta-data, that is written into the catalog. No actual physical index structure is built. By this, the query optimiser is tricked in believing that a variety of indexes exist and considers them in finding the best query plan. When the best query plan is found, a recommendation for index creation is returned for every column on which an index access occurs in that plan. As a consequence of this design, the recommendation of the advisory tool is only as good as the statistics of the query optimiser. Besides, these tools typically compute a recommendation per query and then try to synthesise a workload recommendation from the individual results. Thus, the overall best decision on index selection might be missed.
With the description of the learning workflow and the problem environment at hand, let us come to the actual evaluation. In the following, we will compare the quality of index selection using deep reinforcement learning on a number of given workloads.
All experiments are conducted on a desktop PC, equipped with a GHz Intel Core i7 7700K processor with cores, 16GB of RAM, and a NVIDIA GeForce GTX 1080 Ti. For learning, we use keras-rl , that implements a general reinforcement learner in python on top of keras, which is itself a high-level neural network API on top of a deep learning backend like Tensorflow, Theano, or Microsoft CNTK
. We chose Microsoft Cognitive Toolkit (CNTK) with GPU support as a backend to keras in this work. To describe the problem environment, we use thegym  library from the OpenAI  project and then pass this environment to keras-rl to perform the learning. For learning, we use the optimisation of experience replay , where the network is not directly trained on the current experience, but on previously seen ones to avoid getting stuck in local minima. As database, we use schema and data of the TPC-H benchmark  in scale factor and run the queries in PostgreSQL. In this work, our workloads query only the LINEITEM table.
For the experimental evaluation, we have to distinguish between the training workloads and a test workload. We use the training workloads to train the index selection during reinforcement learning. Note that we use a multitude of randomly generated training workloads to confront the learner with a variety of different scenarios. The test workload is then used to compare the quality of our trained neural network in comparison to having all indexes available.
|Workload||NoIndex [ms]||IndexedAll [ms]||NoDBA [ms]|
Every query in the training workloads is a SELECT count(*) query on the LINEITEM table of TPC-H. In the WHERE clause, we filter on a fixed number of columns and combine their individual selections using AND. For every randomly chosen column in the WHERE clause, we perform either an equality selection (e.g. L_TAX = 0.02) or a range selection (e.g. L_SHIPDATE < ’1994-01-01’). The predicates are randomly selected from a list of actually occurring values.
|l_partkey < 100, l_suppkey < 100, l_linenumber = 1, l_discount = 0.02|
|l_orderkey < 100000, l_partkey < 10000, l_quantity = 1, l_linenumber = 1|
|l_quantity = 1, l_partkey < 100000, l_suppkey < 1000, l_orderkey < 100000|
|l_orderkey < 100000, l_discount = 0.0, l_suppkey < 1000, l_linenumber = 1|
|l_orderkey < 100000, l_partkey < 100000, l_suppkey < 10000, l_linenumber = 1, l_discount = 0.01|
|l_orderkey < 1000000, l_partkey < 10000, l_suppkey < 10000, l_linenumber = 1, l_quantity = 1|
|l_quantity = 1, l_partkey < 100000, l_orderkey < 1000000, l_linenumber = 1|
|l_orderkey < 100000, l_partkey < 100000, l_suppkey < 100000, l_quantity = 1, l_discount = 0.0|
|l_orderkey < 100000, l_partkey < 100000, l_suppkey < 100000, l_linenumber = 1, l_quantity = 1, l_discount = 0.02|
|l_orderkey < 100000, l_partkey < 100000, l_suppkey < 100000, l_linenumber = 1, l_discount = 0.02|
|l_orderkey < 10, l_suppkey < 50000, l_extendedprice < 50000, l_receiptdate < ’1993-12-31’, l_returnflag = ’A’, l_linestatus = ’O’|
|l_orderkey < 1500, l_extendedprice < 10000, l_shipinstruct = ’TAKE BACK RETURN’, l_receiptdate < ’1993-06-30’|
|l_suppkey < 100, l_shipdate < ’1993-01-01’, l_receiptdate < ’1992-06-29’, l_linenumber = 4, l_shipinstruct = ’TAKE BACK RETURN’, l_shipmode = ’SHIP’|
|l_orderkey < 1500, l_suppkey < 1000, l_shipdate < ’1995-03-31’, l_linenumber = 4, l_tax = 0.0, l_returnflag = ’N’|
|l_suppkey < 50000, l_extendedprice < 1000, l_commitdate < ’1995-01-28’, l_receiptdate < ’1992-06-29’, l_quantity = 1, l_linestatus = ’O’|
For testing the quality, we use three synthetic workloads, each consisting of five queries of the form SELECT COUNT(*) FROM LINEITEM WHERE Selection1 AND Selection 2 AND ..., where the selections are listed in Tables 3, 4, and 5. The workloads (Table 3) and (Table 4) consist of queries, which perform up to six selections with high selectivity on a fixed set of attributes of LINEITEM. The queries of workload (Table 5) choose their selections on all attributes of LINEITEM. In Table 2, we now evaluate the performance of running the workload without indexes as well as indexes on all columns in comparison with the recommended ones by our NoDBA. As we can see, on the highly selective workloads and , our selection performs as good or even better333The query optimiser might make a bad decision in using an index on a particular column. as having indexes on all columns. For , which selects on a larger set of attributes, our selection of only indexes still manages to improve the runtime. The training time of our network, that has to happen once, was minutes. After that, an individual prediction took only around ms.
4 Conclusion and Future Work
In this work, we demonstrated how deep reinforcement learning can be used to automatically administer a database management system. We showcased our concept using index selection as one possible example. We showed that it is possible to train a neural network for such a task.
Obviously our work is just a first glimpse on what is possible when tuning a database using deep learning. We are currently working on extending these initial results in several dimensions.
For instance, compared to traditional cost-based what-if optimisations, a strong benefit of our approach is that we can also optimise without those estimates — which are possibly faulty and misleading anyways . We are currently exploring the trade-offs of this idea.
Motivated by these promising results, we also believe that deep reinforcement learning can be used to tune a variety of other components of a DBMS, such as the query optimiser. In addition, we would like to extend our work to non-stationary workloads.
Deep Learning (m)eats Databases. VLDB 2017. Keynote.
-  https://github.com/matthiasplappert/keras-rl, January 2018.
-  https://github.com/openai/gym, January 2018.
-  https://openai.com/, January 2018.
-  http://www.tpc.org/tpch/, January 2018.
-  K. Arulkumaran, M. P. Deisenroth, M. Brundage, et al. A brief survey of deep reinforcement learning. CoRR, abs/1708.05866, 2017.
-  R. Caruana and A. Niculescu-Mizil. An empirical comparison of supervised learning algorithms. In Machine Learning, Proceedings of the Twenty-Third International Conference (ICML 2006), Pittsburgh, Pennsylvania, USA, June 25-29, 2006, pages 161–168, 2006.
-  V. Leis, A. Gubichev, A. Mirchev, et al. How good are query optimizers, really? PVLDB, 9(3):204–215, 2015.
-  V. Mnih, K. Kavukcuoglu, D. Silver, et al. Playing atari with deep reinforcement learning. CoRR, abs/1312.5602, 2013.
-  V. Mnih, K. Kavukcuoglu, D. Silver, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
-  D. Silver, J. Schrittwieser, K. Simonyan, et al. Mastering the game of go without human knowledge. Nature, 550:354 EP –, 10 2017.