Limbo: A Fast and Flexible Library for Bayesian Optimization

11/22/2016 ∙ by Antoine Cully, et al. ∙ 0

Limbo is an open-source C++11 library for Bayesian optimization which is designed to be both highly flexible and very fast. It can be used to optimize functions for which the gradient is unknown, evaluations are expensive, and runtime cost matters (e.g., on embedded systems or robots). Benchmarks on standard functions show that Limbo is about 2 times faster than BayesOpt (another C++ library) for a similar accuracy.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Non-linear optimization problems are pervasive in machine learning. Bayesian Optimization (BO) is designed for the most challenging ones: when the gradient is unknown, evaluating a solution is costly, and evaluations are noisy. This is, for instance, the case when we want to find optimal parameters for a machine learning algorithm

(snoek2012practical), because testing a set of parameters can take hours, and because of the stochastic nature of many machine learning algorithms. Besides parameter tuning, Bayesian optimization recently attracted a lot of interest for direct policy search in robot learning (lizotte2007automatic; wilson2014using; calandra2016bayesian) and online adaptation; for example, it was recently used to allow a legged robot to learn a new gait after a mechanical damage in about 10-15 trials (2 minutes) (cully2015robots).

At its core, Bayesian optimization builds a probabilistic model of the function to be optimized (the reward/performance/cost function) using the samples that have already been evaluated (shahriari2016taking); usually, this model is a Gaussian process (williams2006gaussian). To select the next sample to be evaluated, Bayesian optimization optimizes an acquisition function which leverages the model to predict the most promising areas of the search space. Typically, this acquisition function is high in areas not yet explored by the algorithm (i.e., with a high uncertainty) and in those where the model predicts high-performing solutions. As a result, Bayesian optimization handles the exploration / exploitation trade-off by selecting samples that combine a good predicted value and a high uncertainty.

In spite of its strong mathematical foundations (mockus2012bayesian), Bayesian optimization is more a template than a fully-specified algorithm. For any Bayesian optimization algorithm, the following components need to be chosen: (1) an initialization function (e.g., random sampling), (2) a model (e.g., a Gaussian process, which itself needs a kernel function and a mean function), (3) an acquisition function (e.g., Upper Confidence Bound, Expected Improvement, see shahriari2016taking), (4) a global, non-linear optimizer for the acquisition function (e.g., CMA-ES, hansen2001completely, or DIRECT, jones1993lipschitzian) (5) a non-linear optimizer to learn the hyper-parameters of the models (if the user chooses to learn them). Specific applications often require a specific choice of components and most research articles focus on the introduction of a single component (e.g., a novel acquisition function or a novel kernel for Gaussian processes).

This almost infinite number of variants of Bayesian optimization calls for flexible libraries in which components can easily be substituted with alternative ones (user-defined or predefined). In many applications, the run-time cost is negligible compared to the evaluation of a potential solution, but this is not the case in online adaptation for robots (e.g., cully2015robots), in which the algorithm has to run on small embedded platforms (e.g., a cell phone), or in interactive applications (brochu2010tutorial), in which the algorithm needs to quickly react to the inputs. To our knowledge, no open-source library combines a high-performance implementation of Bayesian optimization with the high flexibility that is needed for developing and deploying novel variants.

Figure 1: Comparison of the accuracy (difference with the optimal value) and the wall clock time for Limbo and BayesOpt (martinezcantin14a) – a state-of-the art C++ library for Bayesian optimization – on common test functions (see http://www.sfu.ca/~ssurjano/optimization.html

). Two configurations are tested: with and without optimization of the hyper-parameters of the Gaussian Process. Each experiment has been replicated 250 times. The median of the data is pictured with a thick dot, while the box represents the first and third quartiles. The most extreme data points are delimited by the whiskers and the outliers are individually depicted as smaller circles. Limbo is configured to reproduce the default parameters of BayesOpt.

The Limbo library

Limbo (LIbrary for Model-based Bayesian Optimization) is an open-source (GPL-compatible CeCiLL license) C++11 library which provides a modern implementation of Bayesian optimization that is both flexible and high-performing. It does not depend on any proprietary software (the main dependencies are Boost and Eigen3). The code is standard-compliant but it is currently mostly developed for GNU/Linux and Mac OS X with both the GCC and Clang compilers. The library is distributed via a GitHub repository111http://github.com/resibots/limbo, in which bugs and further developments are handled by the community of developers and users. An extensive documentation222http://www.resibots.eu/limbo is available and contains guides, examples, and tutorials. New contributors can rely on a full API reference, while their developments are checked via a continuous integration platform (automatic unit-testing routines). Limbo was instrumental in several of our robotics projects (e.g., cully2015robots) but it has successfully been used internally for other fields.

The implementation of Limbo follows a template-based, policy-based design (alexandrescu2001modern), which allows it to be highly flexible without paying the cost induced by classic object-oriented designs (driesen1996direct) (cost of virtual functions). In practice, changing one of the components of the algorithms in Limbo (e.g., changing the acquisition function) usually requires changing only a template definition in the source code. This design make it possible for users to rapidly experiment and test new ideas while being as fast as specialized code.

According to the benchmarks we performed (Figure 1), Limbo finds solutions with the same level of quality as BayesOpt (martinezcantin14a), within a significantly lower amount of time: for the same accuracy (less than between the optimized solutions found by Limbo and BayesOpt), Limbo is between and times faster (median values) than BayesOpt when the hyper-parameters are not optimized, and between and times faster when they are.

Using Limbo

The policy-based design of Limbo allows users to define or adapt variants of Bayesian Optimization with very little change in the code. The definition of the optimized function is achieved by creating a functor (an arbitrary object with an operator()

function) that takes as input a vector and outputs the resulting vector (Limbo can support multi-objective optimization); this object also defines the input and output dimensions of the problem (

dim_in, dim_out). For example, to maximize the function :  

struct my_fun {
    static constexpr size_t dim_in = 2;
    static constexpr size_t dim_out = 1;
    Eigen::VectorXd operator()(const Eigen::VectorXd& x) const {
        double res = -(x.array().square() * (x * 2).sin()).sum();
        return limbo::tools::make_vector(res);
    }
};

Optimizing my_fun with the default parameters only requires instantiating a BOptimizer object and call the “optimize” method:

limbo::bayes_opt::BOptimizer<Params> opt;
opt.optimize(my_fun());

where Params is a structure that defines all the parameters in a static way, for instance:

struct Params {
  // default parameters for the acquisition function ’gpucb’
  struct acqui_gpucb : public limbo::defaults::acqui_gpucb { };
  // custom parameters for the optimizer
  struct bayes_opt_boptimizer : public limbo::defaults::bayes_opt_boptimizer {
    BO_PARAM(double, noise, 0.001);
  };
  // 
}

While default functors are provided, most of the components of BOptimizer can be replaced to allow researchers to investigate new variants. For example, changing the kernel function from the Squared Exponential kernel (the default) to another type of kernel (here the Matern-5/2) and using the UCB acquisition function is achieved as follows:

// define the templates
using Kernel_t = limbo::kernel::MaternFiveHalves<Params>;
using Mean_t = limbo::mean::Data<Params>;
using GP_t = limbo::model::GP<Params, Kernel_t, Mean_t>;
using Acqui_t = limbo::acqui::UCB<Params, GP_t>;
// instantiate a custom optimizer
limbo::bayes_opt::BOptimizer<Params, limbo::modelfun<GP_t>, limbo::acquifun<Acqui_t>> opt;
// run it
opt.optimize(my_fun());

In addition to the many kernel, mean, and acquisition functions that are implemented, Limbo provides several tools for the internal optimization of the acquisition function and the hyper-parameters. For example, a wrapper around the NLOpt library (which provides many local, global, gradient-based, and gradient-free algorithms) allows Limbo to be used with a large variety of internal optimization algorithms. Moreover, several “restarts” of these internal optimization processes can be performed in parallel to avoid local optima with a minimal computational cost, and several internal optimizations can be chained in order to take advantage of the global aspects of some algorithms and the local properties of others.

Acknowledgements

This project is funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Project: ResiBots, grant agreement No 637972).

References