## 1 Introduction

Automatic machine learning (AutoML) aims to learn how to learn. Given a dataset, a well defined task, and performance criteria, the goal is to solve the task with respect to the dataset while optimizing performance. Existing systems have focused on a relatively small set of machine learning primitives, with a few tasks (feurer2015autosklearn), or on a small set of datasets (chen2018autostacker), or on numerous datasets within specific domains (olson2016tpot).

DARPA’s Data Driven Discovery of Models (D3M) program pushes this vision further and proposes to develop infrastructure to automate model discovery, i.e., solve any task on any dataset specified by the user. Using a broad set of computational primitives as building blocks, the D3M system should synthesize a pipeline and set the appropriate hyper-parameters to solve a previously unknown data and problem. The D3M system also has a user interface that enables users to interact with and improve the automatically generated results (blei2017datascience).

Inspired by AlphaZero (silver2017alpha0), we frame the problem of pipeline synthesis for model discovery as a single-player game (mcaleer2018solving): the player iteratively builds a pipeline by selecting among a set of actions which are insertion, deletion, replacement of pipeline parts. An inherent advantage of this approach is that at the end of the process, once there is a working pipeline, it is completely explainable, including all the actions and decisions which led to its synthesis. Another advantage is that our approach leverages recent advances in deep reinforcement learning using self play, specifically expert iteration (anthony2017thinking) and AlphaZero (silver2017alpha0)

, by using a neural network for predicting pipeline performance and action probabilities, along with a Monte-Carlo Tree Search (MCTS), as illustrated in Figure

1 (left), which takes strong decisions based on the network. The process progresses by self play with iterative self improvement, and is known to be highly efficient at finding a solution to search problems in very high dimensional spaces. We evaluate our approach using the OpenML dataset on the tasks of classification and regression, demonstrating competitive performance and computation times an order of magnitude faster than other AutoML systems.Each of the existing AutoML systems uses any one of the following key elements individually: differentiable programming, tree search, evolutionary algorithms, and Bayesian optimization, to find the best machine learning pipelines for a given task and dataset. Differentiable programming, of which neural network backpropagation is a special case, is used for learning feature extraction and estimation

(ganin2014unsupervised) and for end-to-end learning of machine learning pipelines with differentiable primitives (mitar2017). Bayesian optimization methods are used for hyper-parameter tuning (bergstra2012). Both AutoWEKA (autoweka2017) and Autosklearn (feurer2015autosklearn)extend the application of these techniques to the selection of the model in addition to the hyper-parameter values, solving the combined algorithm selection and hyper-parameter optimization problem by fitting probabilistic models capturing the relationship between parameter values and performance measures using a Gaussian Process, Random Forest, or tree-structured Parzen estimator

(bergstra2011). Auto-Tuned Models (atm2017)represent the search space as a tree with nodes being algorithms or hyperparameters and searches for the best branch using a multi-armed bandit. TPOT

(olson2016tpot) and Autostacker (chen2018autostacker) uses evolutionary algorithms to generate machine learning pipelines while optimizing their hyperparameters. TPOT represents machine learning pipelines as trees, whereas Autostacker represents them as stacked layers.Our goal is to search within a large space for the machine learning, and pre and post processing primitives and parameters which together constitute a pipeline for solving a task on a given dataset. The problem is that of high dimensional search. Although the datasets differ, the solution pipelines contain recurring patterns. Just as a data scientist develops intuition and patterns about the pipeline components, we use a neural network along with a Monte-Carlo tree search in an iterative process. This combination results in the network learning these patterns while the search splits the problem into components and looks ahead for solutions. By self play and evaluations the network improves, incorporating a better intuition. An advantage of this iterative dual process is that it is computationally efficient in high dimensional search (silver2017alpha0).

## 2 Methods

Following dual process theory, we solve the meta learning problem by sequence modeling using a deep neural network and Monte Carlo tree search (MCTS) (silver2017alpha0; anthony2017thinking). This section describes our representation, followed by details of the neural network and MCTS.

### 2.1 Representation

Figure 1

(right) illustrates a high level analogy between a two player competitive game and our single player pipeline synthesis game, including state, action, and reward. A pipeline is a data mining work flow, of pre-processing, feature extraction, feature selection, estimation, and post-processing primitives. Algorithm

1 describes our pipeline state representation. Our architecture models meta data and an entire pipeline chain as state rather than individual primitives. A pipeline, together with the meta data and problem definition is analogous to an entire game board configuration. The actions are transitions from one state (pipeline) to another.### 2.2 Neural Network

AlphaD3M uses a recurrent neural network, specifically an LSTM. Let

= , where is the action probabilities and the evaluation score of the model predicted by the network with parameters , for a given dataset and task , for a given state . The neural network predicts the probabilities over actions which lead to sequences that describe a pipeline, which in turn solves the given task on the dataset. The network inputs are training examples from games of self play, where is the state at time , the policy estimated by MCTS, and the actual pipeline evaluation at the end of the game. The state is composed of a vector encoded as described in Algorithm 1. The network outputs are probabilities over actions , and an estimate of pipeline performance .We optimize the network parameters by making the predicted model match the real world model and the predicted evaluation results match the real world evaluation , by minimizing the cross entropy loss between and , and the mean squared error between and . We add an regularization term for the network parameters to avoid over-fitting and an regularization term which prefers simple pipelines. Thus our network

is trained by minimizing the following non-linear loss function using stochastic gradient descent:

(1) |

### 2.3 Monte Carlo Tree Search

Our algorithm takes the predictions of the neural network and uses them in a MTCS by running multiple simulations to search for a pipeline sequence with a better evaluation. The search result improves upon the predicted result given by the network by improving the network policy using the update rule:

(2) |

where is the expected reward for action from state , is the number of times action was taken from state , the number of times state was visited, is the estimate of the neural network for the probability of taking action from state , and is a constant which determines the amount of exploration. At each step of the simulation, we find the action and state which maximize and add the new state to the tree if it does not exist with the neural network estimates or call the search recursively otherwise. Next, the model represented by is realized and applied to the data to solve the task, resulting in a better evaluation which is the result of running the generated pipeline on the data and task. Thus the real world search provides us with , where is the real world model, consisting of machine learning primitives, and the real world evaluation of the model and pipeline using those primitives on the data and task.

The neural network predictions, the MCTS model, and the real world evaluation, together, define a loss function shown in Equation 1, which is minimized to improve the neural network parameters. This process continues iteratively until the best model, which automatically solves the task, is found.

Inspired by the neural editor (guu2017generating) we use edit operations that make the pipeline generation explainable by design. For each iteration of self play the MCTS searches the possible valid pipelines. For each state or pipeline the next possible states or pipelines are limited to those derived from the edit operations of the current state.

## 3 Results

The data consists of 313 different tabular datasets, of which 296 are from OpenML (openml2014). We considered classification, both binary (121 datasets) and multi-class (108 datasets), and univariate regression tasks (84 datasets). Baseline pipelines were constructed using sklearn SGD estimators for classification and regression, and an annotated tabular feature extractor which uses linear SVC, Lasso, percentile classification or regression estimators from sklearn.

Figure 2 compares performance between AlphaD3M and SGD which is the baseline pipeline. Each of the 180 points represents a classification task on a different OpenML dataset. The datasets for which AlphaD3M performs better than SGD are shown by green circles and those for which SGD performs better are shown by red crosses. Figure 2 shows that AlphaD3M performs better than baseline for 75% of the datasets, both are comparable for 18% of the datasets, and performs worse for only 7% of the datasets. Figure 3 shows the normalized difference in cross validation performance of AlphaD3M and SGD baseline for a classification task for 180 datasets, split according to the estimators used by AlphaD3M, demonstrating better performance across diverse estimators.

Figure 4 compares performance between different AutoML methods: Autosklearn, TPOT, and Autostacker, and our method AlphaD3M, for a number of common OpenML datasets, which serve as representative benchmark datasets for AutoML systems (olson2017pmlb; olson2016tpot; chen2018autostacker)

. For each method and dataset, we compute the performance mean and standard deviation by repeated evaluation. As shown in Figure

4our method, AlphaD3M, is competitive with other approaches. All four methods are competitive and on par, as their performance including confidence intervals intersect; whereas SGD and Random Forest are not competitive with the leading AutoML methods.

AlphaD3M is implemented using PyTorch. Our implementation takes advantage of GPUs while training the neural network and uses CPUs for the MCTS. Table

1 compares the running time of TPOT, Autostacker, and AlphaD3M on the same datasets, along with the corresponding speedup factors. Table 1 shows that AlphaD3M performs on average an order of magnitude faster, reducing computation time from hours to minutes.Dataset/Method | TPOT | Autostacker | AlphaD3M | Speedup vs TPOT | Speedup vs AS |
---|---|---|---|---|---|

breast cancer | 3366 | 1883 | 460 | 7.3 | 4 |

hill valley | 17951 | 8411 | 556 | 32.2 | 15.1 |

monks | 1517 | 1532 | 348 | 4.3 | 4.3 |

pima | 5305 | 1940 | 619 | 8.5 | 3.1 |

spectf | 4191 | 1673 | 522 | 8 | 3.2 |

vehicle | 16795 | 4010 | 531 | 31.6 | 7.5 |

## 4 Conclusions

We introduced AlphaD3M, an automatic machine learning system with competitive performance, which is an order of magnitude faster than existing state-of-the-art AutoML methods, reducing computation time from hours to minutes. We presented the first single player AlphaZero game representation applied to meta learning by modeling meta-data, task, and entire pipelines as state.

Comments

There are no comments yet.