The Case for Automatic Database Administration using Deep Reinforcement Learning

01/17/2018 ∙ by Ankur Sharma, et al. ∙ 0

Like any large software system, a full-fledged DBMS offers an overwhelming amount of configuration knobs. These range from static initialisation parameters like buffer sizes, degree of concurrency, or level of replication to complex runtime decisions like creating a secondary index on a particular column or reorganising the physical layout of the store. To simplify the configuration, industry grade DBMSs are usually shipped with various advisory tools, that provide recommendations for given workloads and machines. However, reality shows that the actual configuration, tuning, and maintenance is usually still done by a human administrator, relying on intuition and experience. Recent work on deep reinforcement learning has shown very promising results in solving problems, that require such a sense of intuition. For instance, it has been applied very successfully in learning how to play complicated games with enormous search spaces. Motivated by these achievements, in this work we explore how deep reinforcement learning can be used to administer a DBMS. First, we will describe how deep reinforcement learning can be used to automatically tune an arbitrary software system like a DBMS by defining a problem environment. Second, we showcase our concept of NoDBA at the concrete example of index selection and evaluate how well it recommends indexes for given workloads.



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

An industry-grade DBMS is a massive software system. As the requirements of the individual end-users and their workloads heavily differ, these systems are equipped with tons of options and configuration knobs. Finding the ”right” setup for the current situation has been an extremely challenging task ever since and even lead to the creation of an entire profession around that task: the database administrator or DBA. These administrators constantly monitor the performance of the system and tune the parameters to fit to the current workload as good as possible. To a certain degree, they rely on so called design advisory tools, that provide recommendations on what to do based on statistics. However, as statistics have limits [8], a good administrator will always rely at least equally on experience and intuition to come to the right decision.

Both experience and intuition

are deeply embedded in the field of machine learning. Recently, deep learning techniques showed fascinating results in learning complex tasks, such as playing difficult games 

[11, 9]. In general, they seem to perform great in situations with vast search spaces and problems, that are too complex to grasp by traditional machine learning approaches [11, 1]. This raises the question whether it is possible to apply deep learning techniques to the problem of DBMS administration as well. Due to the vast amount of configuration parameters and workload differences, the search space is huge as well and extremely hard to overview — the perfect candidate for deep learning.

1.1 Deep Reinforcement Learning

Especially interesting in the context of administration is deep reinforcement learning [6]

. In contrast to traditional supervised learning 


, where a neural network is trained on a set of given inputs and expected outputs, in reinforcement learning, the training process does not require any expected outputs. Instead, the training is completely driven by so called 

rewards, that tell the learner whether a taken action lead to a positive or a negative result on the input. Depending on the outcome, the neural network is encouraged or discouraged to consider the action on this input in the future. Obviously, this is directly equivalent to an administrator, who changes a parameter of the system setup and is either rewarded or punished with an improved or worsened performance. A smart DBA will base future decisions on the made experiences.

In the same manner, we can apply deep reinforcement learning to train a neural network in taking over the administration process. Essentially, we have to define a so called problem environment consisting of the four following components to perform the learning:

  1. The input to the neural network. This is typically the current workload in form of query characteristics, for which the system should be optimised as well as the current state of the configuration.

  2. The set of actions, that can be taken. An action could be to create a secondary index on a certain column or to change the size of a database buffer. Such an action is a transition from a current system configuration to a new system configuration.

  3. The reward function, that rates the impact of a taken action. For example, the runtime of the new configuration after the action has been taken could be compared with the best configuration seen so far. The higher the improvement, the higher is the returned positive reward. If performance degrades, a negative reward is returned.

  4. The hyper parameters to steer the learning process. This includes properties of the neural network (e.g. number of hidden layers, number of nodes per layer) as well as properties of the learning process like the number of iterations.

With the right instantiation of these  categories, we are generally able to train a neural network for a given optimisation goal (e.g. minimal runtime) and a given workload (e.g. a set of queries).

1.2 Prediction and Training

With a high-level understanding of the involved components, let us now first go through the workflow of a general learner and see how it predicts and trains. Assuming that all required hyper parameters are configured, which we will discuss in detail in Section 2.4, step  of the learning process essentially consists of the following sequence:

  1. Using the neural network, we predict the next action  to perform. In the beginning of the training phase, a prediction will be mostly arbitrary and of low quality. Over time, the training will (hopefully) improve the prediction quality.

  2. We apply the predicted action  to the current configuration , resulting in a new configuration .

  3. We compute the reward  of the taken action  using the reward function on the new configuration. This transition from one configuration to a new configuration can be summarised in the quadruple

  4. We train the neural network using and restart in (1). A positive reward  will enforce the neural network to predict  on future configurations, which are similar to .

For a better understanding, let us see how this general workflow maps to the concrete example of playing a famous classic Atari game: Breakout [9]. The goal of Breakout is to destroy a set of blocks, located at the top of the screen, by repeatedly bouncing back a ball with a paddle, that can be moved horizontally. As soon as all blocks are destroyed, the game is won. If the paddle misses the ball, such that the ball drops below the screen, the game is lost. Figure 1 shows Breakout on the Atari 2600.

Figure 1: Breakout on the Atari 2600

To perform reinforcement learning on Breakout, the current state of the game board (i.e. the locations of blocks, ball, and paddle) is fed into the neural network as input. The set of actions is composed of the possible movements of the paddle: the player can either move a step to the left, move a step to the right, or stay. A positive reward is generated, if the paddle hits the ball. Table 1 summarises the setup again.

Category Atari Breakout [9] NoDBA [this work]
Input Current state of the game board. Workload and current index configuration.
Set of Actions Move paddle left, move paddle right, stay Create an index on a particular column.
Reward Positive, if the paddle hits the ball. Positive, if an index configuration improves over previous state.
Table 1: Comparison of Input, Set of Actions, and Reward for Atari Breakout and NoDBA.

The actual learning happens by applying the previously described steps. Early in the learning phase, the predicted paddle movements seem random and meaningless. However, as soon as the paddle manages to bounce back the ball, a positive reward is generated and fed back into the network for training. Over time, the system will learn to position the paddle horizontally as close as possible to the falling ball.

Obviously, the core principle of deep reinforcement learning is fairly simple and thanks to the general concepts of input, action set, and reward function, can be applied to a variety of problems. In this paper, we will showcase the quality of deep reinforcement learning in the database world using index selection as an example — a task that has a crucial impact on query execution times.

1.3 Automatic Index Selection

Given a workload, the task of index selection is to decide on which attributes to create secondary indexes, such that the processing of the workload benefits the most. Typically, an upper limit on the number of indexes to create is given due to space constraints and index maintenance costs, so simply indexing all columns is not an option. Of course, the goal is to come up with a proper index selection in an automatic way without the need of a human administrator. Let us precisely define the problem: assuming our database schema  consists of  columns (which can originate from multiple tables) and  is the maximum amount of indexes we are allowed to create with , then we want to find a subset of of maximum size  that minimises the runtime on a workload .


With the problem definition in mind, let us see step by step how we can map the problem of index selection to the general deep reinforcement learner described previously in Section 1.2. We will first define the input to the neural network in Section 2.1, the set of possible actions in Section 2.2, and the reward function in Section 2.3. Then, we will discuss the hyper parameters of the learner in Section 2.4, that we have to set up.

2.1 Neural Network Input

We start by defining the input to our neural network. As mentioned in Section 1.1, this input is typically a combination of the encoding of (1) the workload and (2) the current configuration. Thus, we will separate the input into two parts  and , that are both fed into the neural network.

In , we encode the characteristics of the workload in form of a matrix of size , where is the number of queries in the workload and  is the number of columns in the database schema. An entry  () of the matrix describes for a query  and a column  the selectivity :


If a query selects on a particular column , then returns the number of selected records in relation to the total number of records — the smaller the result, the higher111A small number of returned records is called a high selectivity. the selectivity. If a query  does not select on a particular column  at all, then . This denotes the lowest possible selectivity and will encourage the system not to consider building an index on this column. These selectivities are computed by running every query of the workload on the database once upfront.

As already mentioned, we additionally have to encode the current index configuration in , which describes on which columns indexes exist. Thus, is simply the following bitlist of size 

with , if there exists an index on or no query in the workload uses , and  otherwise.

2.2 Set of Actions

Next, we have to define the set of possible actions , that can be carried out. For the problem of index selection, we have to consider only a single action: the creation of an index on a specific column222We do not consider multi-column indexes in this use-case. . Thus,

We have to mention here, that we are applying episodic reinforcement learning, like it is typically performed for learning games. This means that multiple steps as described in Section 1.2 form a so called episode, in which we transition from an initial configuration to a final configuration. For games, the initial configuration typically describes the starting setup of the game board and the final configuration a state of winning or losing. In our problem of index selection, the initial configuration resembles having no indexes at all, whereas a final configuration is a configuration of the maximum number of indexes . Consequently, we form a new episode by, starting with the initial configuration, incrementally adding indexes one by one until the maximum amount  is reached. As soon as is reached, we conclude the episode, drop all indexes, and start the next episode with the initial configuration again. Thus, an action drop_index_on() is not required in this setup.

2.3 Reward

Defining the reward function is a delicate task and has a high impact on the learning quality. In our example, the overall goal is to optimise the query response time. First, we have to define a cost function. If is the set of indexed columns, then

returns the cost of the entire workload for this set of indexes. Of course, this cost can be either estimated by the DBMS to achieve a high training speed (e.g. by using the

EXPLAIN command) or it can originate from actual query runtimes to achieve higher precision.

To have a baseline to compare with, we first compute the cost of executing the workload without any indexes as  . Then, for an index configuration  containing indexes with , the function

returns the reward of that configuration . Obviously, the lower the , the higher the reward. If a configuration  does not lower the cost at all or even increases the cost, the reward is .

2.4 Hyper Parameters

To run the described reinforcement learner, a number of hyper parameters must be set before execution. First, we have to set the upper limit on the amount of indexes, denoted as , usually dictated by a space constraint on the available memory. We chose  for the upcoming evaluation. The remaining parameters are directly connected to the reinforcement learner and are largely chosen in a process of trial and error: we set the number of hidden layers to 

, the number of neurons per hidden layer to 

, and the activation function for the hidden layers and the output layer to

RELU and SOFTMAX respectively. As agent, which resembles the actual learning algorithm, we use CEM.

2.5 Design Advisory Tools

As mentioned, there is a classical helper tool of the DBA with a similar purpose — the design advisory software, that is shipped with an industry-grade DBMS. These tools are used by passing a workload file containing a set of SQL queries. To come up with index recommendations, they introduce so called virtual indexes. A virtual index is just the description of a hypothetical index in form of meta-data, that is written into the catalog. No actual physical index structure is built. By this, the query optimiser is tricked in believing that a variety of indexes exist and considers them in finding the best query plan. When the best query plan is found, a recommendation for index creation is returned for every column on which an index access occurs in that plan. As a consequence of this design, the recommendation of the advisory tool is only as good as the statistics of the query optimiser. Besides, these tools typically compute a recommendation per query and then try to synthesise a workload recommendation from the individual results. Thus, the overall best decision on index selection might be missed.

3 Evaluation

With the description of the learning workflow and the problem environment at hand, let us come to the actual evaluation. In the following, we will compare the quality of index selection using deep reinforcement learning on a number of given workloads.

3.1 Setup

All experiments are conducted on a desktop PC, equipped with a  GHz Intel Core i7 7700K processor with  cores, 16GB of RAM, and a NVIDIA GeForce GTX 1080 Ti. For learning, we use keras-rl [2], that implements a general reinforcement learner in python on top of keras, which is itself a high-level neural network API on top of a deep learning backend like Tensorflow, Theano, or Microsoft CNTK

. We chose Microsoft Cognitive Toolkit (CNTK) with GPU support as a backend to keras in this work. To describe the problem environment, we use the

gym [3] library from the OpenAI [4] project and then pass this environment to keras-rl to perform the learning. For learning, we use the optimisation of experience replay [10], where the network is not directly trained on the current experience, but on previously seen ones to avoid getting stuck in local minima. As database, we use schema and data of the TPC-H benchmark [5] in scale factor  and run the queries in PostgreSQL. In this work, our workloads query only the LINEITEM table.

3.2 Experiments

For the experimental evaluation, we have to distinguish between the training workloads and a test workload. We use the training workloads to train the index selection during reinforcement learning. Note that we use a multitude of randomly generated training workloads to confront the learner with a variety of different scenarios. The test workload is then used to compare the quality of our trained neural network in comparison to having all indexes available.

Workload NoIndex [ms] IndexedAll [ms] NoDBA [ms]
Table 2: Comparison of workload execution times. We compare NoIndex (executing the workload without any indexes) with our NoDBA (executing the workload with the predicted indexes). Additionally, we show IndexedAll, where an index on every column is available.

Every query in the training workloads is a SELECT count(*) query on the LINEITEM table of TPC-H. In the WHERE clause, we filter on a fixed number of columns and combine their individual selections using AND. For every randomly chosen column in the WHERE clause, we perform either an equality selection (e.g. L_TAX = 0.02) or a range selection (e.g. L_SHIPDATE < ’1994-01-01’). The predicates are randomly selected from a list of actually occurring values.

Query Selections
l_partkey < 100, l_suppkey < 100, l_linenumber = 1, l_discount = 0.02
l_orderkey < 100000, l_partkey < 10000, l_quantity = 1, l_linenumber = 1
l_quantity = 1, l_partkey < 100000, l_suppkey < 1000, l_orderkey < 100000
l_orderkey < 100000, l_discount = 0.0, l_suppkey < 1000, l_linenumber = 1
l_orderkey < 100000, l_partkey < 100000, l_suppkey < 10000, l_linenumber = 1, l_discount = 0.01
Table 3: Workload 
Query Selections
l_orderkey < 1000000, l_partkey < 10000, l_suppkey < 10000, l_linenumber = 1, l_quantity = 1
l_quantity = 1, l_partkey < 100000, l_orderkey < 1000000, l_linenumber = 1
l_orderkey < 100000, l_partkey < 100000, l_suppkey < 100000, l_quantity = 1, l_discount = 0.0
l_orderkey < 100000, l_partkey < 100000, l_suppkey < 100000, l_linenumber = 1, l_quantity = 1, l_discount = 0.02
l_orderkey < 100000, l_partkey < 100000, l_suppkey < 100000, l_linenumber = 1, l_discount = 0.02
Table 4: Workload 
Query Selections
l_orderkey < 10, l_suppkey < 50000, l_extendedprice < 50000, l_receiptdate < ’1993-12-31’, l_returnflag = ’A’, l_linestatus = ’O’
l_orderkey < 1500, l_extendedprice < 10000, l_shipinstruct = ’TAKE BACK RETURN’, l_receiptdate < ’1993-06-30’
l_suppkey < 100, l_shipdate < ’1993-01-01’, l_receiptdate < ’1992-06-29’, l_linenumber = 4, l_shipinstruct = ’TAKE BACK RETURN’, l_shipmode = ’SHIP’
l_orderkey < 1500, l_suppkey < 1000, l_shipdate < ’1995-03-31’, l_linenumber = 4, l_tax = 0.0, l_returnflag = ’N’
l_suppkey < 50000, l_extendedprice < 1000, l_commitdate < ’1995-01-28’, l_receiptdate < ’1992-06-29’, l_quantity = 1, l_linestatus = ’O’
Table 5: Workload 

For testing the quality, we use three synthetic workloads, each consisting of five queries of the form SELECT COUNT(*) FROM LINEITEM WHERE Selection1 AND Selection 2 AND ..., where the selections are listed in Tables 3, 4, and 5. The workloads  (Table 3) and (Table 4) consist of queries, which perform up to six selections with high selectivity on a fixed set of attributes of LINEITEM. The queries of workload  (Table 5) choose their selections on all attributes of LINEITEM. In Table 2, we now evaluate the performance of running the workload without indexes as well as indexes on all columns in comparison with the recommended ones by our NoDBA. As we can see, on the highly selective workloads  and , our selection performs as good or even better333The query optimiser might make a bad decision in using an index on a particular column. as having indexes on all columns. For , which selects on a larger set of attributes, our selection of only  indexes still manages to improve the runtime. The training time of our network, that has to happen once, was  minutes. After that, an individual prediction took only around ms.

4 Conclusion and Future Work

In this work, we demonstrated how deep reinforcement learning can be used to automatically administer a database management system. We showcased our concept using index selection as one possible example. We showed that it is possible to train a neural network for such a task.

Obviously our work is just a first glimpse on what is possible when tuning a database using deep learning. We are currently working on extending these initial results in several dimensions.

For instance, compared to traditional cost-based what-if optimisations, a strong benefit of our approach is that we can also optimise without those estimates — which are possibly faulty and misleading anyways [8]. We are currently exploring the trade-offs of this idea.

Motivated by these promising results, we also believe that deep reinforcement learning can be used to tune a variety of other components of a DBMS, such as the query optimiser. In addition, we would like to extend our work to non-stationary workloads.