Non-linear optimization problems are pervasive in machine learning. Bayesian Optimization (BO) is designed for the most challenging ones: when the gradient is unknown, evaluating a solution is costly, and evaluations are noisy. This is, for instance, the case when we want to find optimal parameters for a machine learning algorithm(snoek2012practical), because testing a set of parameters can take hours, and because of the stochastic nature of many machine learning algorithms. Besides parameter tuning, Bayesian optimization recently attracted a lot of interest for direct policy search in robot learning (lizotte2007automatic; wilson2014using; calandra2016bayesian) and online adaptation; for example, it was recently used to allow a legged robot to learn a new gait after a mechanical damage in about 10-15 trials (2 minutes) (cully2015robots).
At its core, Bayesian optimization builds a probabilistic model of the function to be optimized (the reward/performance/cost function) using the samples that have already been evaluated (shahriari2016taking); usually, this model is a Gaussian process (williams2006gaussian). To select the next sample to be evaluated, Bayesian optimization optimizes an acquisition function which leverages the model to predict the most promising areas of the search space. Typically, this acquisition function is high in areas not yet explored by the algorithm (i.e., with a high uncertainty) and in those where the model predicts high-performing solutions. As a result, Bayesian optimization handles the exploration / exploitation trade-off by selecting samples that combine a good predicted value and a high uncertainty.
In spite of its strong mathematical foundations (mockus2012bayesian), Bayesian optimization is more a template than a fully-specified algorithm. For any Bayesian optimization algorithm, the following components need to be chosen: (1) an initialization function (e.g., random sampling), (2) a model (e.g., a Gaussian process, which itself needs a kernel function and a mean function), (3) an acquisition function (e.g., Upper Confidence Bound, Expected Improvement, see shahriari2016taking), (4) a global, non-linear optimizer for the acquisition function (e.g., CMA-ES, hansen2001completely, or DIRECT, jones1993lipschitzian) (5) a non-linear optimizer to learn the hyper-parameters of the models (if the user chooses to learn them). Specific applications often require a specific choice of components and most research articles focus on the introduction of a single component (e.g., a novel acquisition function or a novel kernel for Gaussian processes).
This almost infinite number of variants of Bayesian optimization calls for flexible libraries in which components can easily be substituted with alternative ones (user-defined or predefined). In many applications, the run-time cost is negligible compared to the evaluation of a potential solution, but this is not the case in online adaptation for robots (e.g., cully2015robots), in which the algorithm has to run on small embedded platforms (e.g., a cell phone), or in interactive applications (brochu2010tutorial), in which the algorithm needs to quickly react to the inputs. To our knowledge, no open-source library combines a high-performance implementation of Bayesian optimization with the high flexibility that is needed for developing and deploying novel variants.
The Limbo library
Limbo (LIbrary for Model-based Bayesian Optimization) is an open-source (GPL-compatible CeCiLL license) C++11 library which provides a modern implementation of Bayesian optimization that is both flexible and high-performing. It does not depend on any proprietary software (the main dependencies are Boost and Eigen3). The code is standard-compliant but it is currently mostly developed for GNU/Linux and Mac OS X with both the GCC and Clang compilers. The library is distributed via a GitHub repository111http://github.com/resibots/limbo, in which bugs and further developments are handled by the community of developers and users. An extensive documentation222http://www.resibots.eu/limbo is available and contains guides, examples, and tutorials. New contributors can rely on a full API reference, while their developments are checked via a continuous integration platform (automatic unit-testing routines). Limbo was instrumental in several of our robotics projects (e.g., cully2015robots) but it has successfully been used internally for other fields.
The implementation of Limbo follows a template-based, policy-based design (alexandrescu2001modern), which allows it to be highly flexible without paying the cost induced by classic object-oriented designs (driesen1996direct) (cost of virtual functions). In practice, changing one of the components of the algorithms in Limbo (e.g., changing the acquisition function) usually requires changing only a template definition in the source code. This design make it possible for users to rapidly experiment and test new ideas while being as fast as specialized code.
According to the benchmarks we performed (Figure 1), Limbo finds solutions with the same level of quality as BayesOpt (martinezcantin14a), within a significantly lower amount of time: for the same accuracy (less than between the optimized solutions found by Limbo and BayesOpt), Limbo is between and times faster (median values) than BayesOpt when the hyper-parameters are not optimized, and between and times faster when they are.
The policy-based design of Limbo allows users to define or adapt variants of Bayesian Optimization with very little change in the code. The definition of the optimized function is achieved by creating a functor (an arbitrary object with an operator()
function) that takes as input a vector and outputs the resulting vector (Limbo can support multi-objective optimization); this object also defines the input and output dimensions of the problem (dim_in, dim_out). For example, to maximize the function :
Optimizing my_fun with the default parameters only requires instantiating a BOptimizer object and call the “optimize” method:
where Params is a structure that defines all the parameters in a static way, for instance:
While default functors are provided, most of the components of BOptimizer can be replaced to allow researchers to investigate new variants. For example, changing the kernel function from the Squared Exponential kernel (the default) to another type of kernel (here the Matern-5/2) and using the UCB acquisition function is achieved as follows:
In addition to the many kernel, mean, and acquisition functions that are implemented, Limbo provides several tools for the internal optimization of the acquisition function and the hyper-parameters. For example, a wrapper around the NLOpt library (which provides many local, global, gradient-based, and gradient-free algorithms) allows Limbo to be used with a large variety of internal optimization algorithms. Moreover, several “restarts” of these internal optimization processes can be performed in parallel to avoid local optima with a minimal computational cost, and several internal optimizations can be chained in order to take advantage of the global aspects of some algorithms and the local properties of others.
This project is funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Project: ResiBots, grant agreement No 637972).