
Deep Gaussian Processes for Multifidelity Modeling
Multifidelity methods are prominently used when cheaplyobtained, but possibly biased and noisy, observations must be effectively combined with limited or expensive true data in order to construct reliable models. This arises in both fundamental machine learning procedures such as Bayesian optimization, as well as more practical science and engineering applications. In this paper we develop a novel multifidelity model which treats layers of a deep Gaussian process as fidelity levels, and uses a variational inference scheme to propagate uncertainty across them. This allows for capturing nonlinear correlations between fidelities with lower risk of overfitting than existing methods exploiting compositional structure, which are conversely burdened by structural assumptions and constraints. We show that the proposed approach makes substantial improvements in quantifying and propagating uncertainty in multifidelity setups, which in turn improves their effectiveness in decision making pipelines.
03/18/2019 ∙ by Kurt Cutajar, et al. ∙ 4 ∙ shareread it

Preferential Bayesian Optimization
Bayesian optimization (BO) has emerged during the last few years as an effective approach to optimizing blackbox functions where direct queries of the objective are expensive. In this paper we consider the case where direct access to the function is not possible, but information about user preferences is. Such scenarios arise in problems where human preferences are modeled, such as A/B tests or recommender systems. We present a new framework for this scenario that we call Preferential Bayesian Optimization (PBO) which allows us to find the optimum of a latent function that can only be queried through pairwise comparisons, the socalled duels. PBO extends the applicability of standard BO ideas and generalizes previous discrete dueling approaches by modeling the probability of the winner of each duel by means of a Gaussian process model with a Bernoulli likelihood. The latent preference function is used to define a family of acquisition functions that extend usual policies used in BO. We illustrate the benefits of PBO in a variety of experiments, showing that PBO needs drastically fewer comparisons for finding the optimum. According to our experiments, the way of modeling correlations in PBO is key in obtaining this advantage.
04/12/2017 ∙ by Javier Gonzalez, et al. ∙ 0 ∙ shareread it

Correcting boundary overexploration deficiencies in Bayesian optimization with virtual derivative sign observations
Bayesian optimization () is a global optimization strategy designed to find the minimum of an expensive blackbox function, typically defined on a continuous subset of R^d, by using a Gaussian process () as a surrogate model for the objective. Although currently available acquisition functions address this goal with different degree of success, an overexploration effect of the contour of the search space is typically observed. However, in problems like the configuration of machine learning algorithms, the function domain is conservatively large and with a high probability the global minimum does not sit the boundary. We propose a method to incorporate this knowledge into the searching process by adding virtual derivative observations in the at the borders of the search space. We use the properties of to impose conditions on the partial derivatives of the objective. The method is applicable with any acquisition function, it is easy to use and consistently reduces the number of evaluations required to optimize the objective irrespective of the acquisition used. We illustrate the benefits our approach in an extensive experimental comparison.
04/04/2017 ∙ by Eero Siivola, et al. ∙ 0 ∙ shareread it

Variational Autoencoded Deep Gaussian Processes
We develop a scalable deep nonparametric generative model by augmenting deep Gaussian processes with a recognition model. Inference is performed in a novel scalable variational framework where the variational posterior distributions are reparametrized through a multilayer perceptron. The key aspect of this reformulation is that it prevents the proliferation of variational parameters which otherwise grow linearly in proportion to the sample size. We derive a new formulation of the variational lower bound that allows us to distribute most of the computation in a way that enables to handle datasets of the size of mainstream deep learning tasks. We show the efficacy of the method on a variety of challenges including deep unsupervised learning and deep Bayesian optimization.
11/19/2015 ∙ by Zhenwen Dai, et al. ∙ 0 ∙ shareread it

GLASSES: Relieving The Myopia Of Bayesian Optimisation
We present GLASSES: Global optimisation with LookAhead through Stochastic Simulation and Expectedloss Search. The majority of global optimisation approaches in use are myopic, in only considering the impact of the next function value; the nonmyopic approaches that do exist are able to consider only a handful of future evaluations. Our novel algorithm, GLASSES, permits the consideration of dozens of evaluations into the future. This is done by approximating the ideal lookahead loss function, which is expensive to evaluate, by a cheaper alternative in which the future steps of the algorithm are simulated beforehand. An Expectation Propagation algorithm is used to compute the expected value of the loss.We show that the farhorizon planning thus enabled leads to substantive performance gains in empirical tests.
10/21/2015 ∙ by Javier Gonzalez, et al. ∙ 0 ∙ shareread it

Batch Bayesian Optimization via Local Penalization
The popularity of Bayesian optimization methods for efficient exploration of parameter spaces has lead to a series of papers applying Gaussian processes as surrogates in the optimization of functions. However, most proposed approaches only allow the exploration of the parameter space to occur sequentially. Often, it is desirable to simultaneously propose batches of parameter values to explore. This is particularly the case when large parallel processing facilities are available. These facilities could be computational or physical facets of the process being optimized. E.g. in biological experiments many experimental set ups allow several samples to be simultaneously processed. Batch methods, however, require modeling of the interaction between the evaluations in the batch, which can be expensive in complex scenarios. We investigate a simple heuristic based on an estimate of the Lipschitz constant that captures the most important aspect of this interaction (i.e. local repulsion) at negligible computational overhead. The resulting algorithm compares well, in running time, with much more elaborate alternatives. The approach assumes that the function of interest, f, is a Lipschitz continuous function. A wraploop around the acquisition function is used to collect batches of points of certain size minimizing the nonparallelizable computational effort. The speedup of our method with respect to previous approaches is significant in a set of computationally expensive experiments.
05/29/2015 ∙ by Javier Gonzalez, et al. ∙ 0 ∙ shareread it

Bayesian Optimization for Synthetic Gene Design
We address the problem of synthetic gene design using Bayesian optimization. The main issue when designing a gene is that the design space is defined in terms of long strings of characters of different lengths, which renders the optimization intractable. We propose a threestep approach to deal with this issue. First, we use a Gaussian process model to emulate the behavior of the cell. As inputs of the model, we use a set of biologically meaningful gene features, which allows us to define optimal gene designs rules. Based on the model outputs we define a multitask acquisition function to optimize simultaneously severals aspects of interest. Finally, we define an evaluation function, which allow us to rank sets of candidate gene sequences that are coherent with the optimal design strategy. We illustrate the performance of this approach in a real gene design experiment with mammalian cells.
05/07/2015 ∙ by Javier Gonzalez, et al. ∙ 0 ∙ shareread it

Reproducing kernel Hilbert space based estimation of systems of ordinary differential equations
Nonlinear systems of differential equations have attracted the interest in fields like system biology, ecology or biochemistry, due to their flexibility and their ability to describe dynamical systems. Despite the importance of such models in many branches of science they have not been the focus of systematic statistical analysis until recently. In this work we propose a general approach to estimate the parameters of systems of differential equations measured with noise. Our methodology is based on the maximization of the penalized likelihood where the system of differential equations is used as a penalty. To do so, we use a Reproducing Kernel Hilbert Space approach that allows to formulate the estimation problem as an unconstrained numeric maximization problem easy to solve. The proposed method is tested with synthetically simulated data and it is used to estimate the unobserved transcription factor CdaR in Steptomyes coelicolor using gene expression data of the genes it regulates.
11/14/2013 ∙ by Javier Gonzalez, et al. ∙ 0 ∙ shareread it

Active MultiInformation Source Bayesian Quadrature
Bayesian quadrature (BQ) is a sampleefficient probabilistic numerical method to solve integrals of expensivetoevaluate blackbox functions, yet so far,active BQ learning schemes focus merely on the integrand itself as information source, and do not allow for information transfer from cheaper, related functions. Here, we set the scene for active learning in BQ when multiple related information sources of variable cost (in input and source) are accessible. This setting arises for example when evaluating the integrand requires a complex simulation to be run that can be approximated by simulating at lower levels of sophistication and at lesser expense. We construct meaningful costsensitive multisource acquisition rates as an extension to common utility functions from vanilla BQ (VBQ),and discuss pitfalls that arise from blindly generalizing. Furthermore, we show that the VBQ acquisition policy is a cornercase of all considered costsensitive acquisition schemes, which collapse onto one single degenerate policy in the case of one source and constant cost. In proofofconcept experiments we scrutinize the behavior of our generalized acquisition functions. On an epidemiological model, we demonstrate that active multisource BQ (AMSBQ) allocates budget more efficiently than VBQ for learning the integral to a good accuracy.
03/27/2019 ∙ by Alexandra Gessner, et al. ∙ 0 ∙ shareread it

Automatic Discovery of PrivacyUtility Pareto Fronts
Differential privacy is a mathematical framework for privacypreserving data analysis. Changing the hyperparameters of a differentially private algorithm allows one to trade off privacy and utility in a principled way. Quantifying this tradeoff in advance is essential to decisionmakers tasked with deciding how much privacy can be provided in a particular application while keeping acceptable utility. For more complex tasks, such as training neural networks under differential privacy, the utility achieved by a given algorithm can only be measured empirically. This paper presents a Bayesian optimization methodology for efficiently characterizing the privacyutility tradeoff of any differentially private algorithm using only empirical measurements of its utility. The versatility of our method is illustrated on a number of machine learning tasks involving multiple models, optimizers, and datasets.
05/26/2019 ∙ by Brendan Avent, et al. ∙ 0 ∙ shareread it

MetaSurrogate Benchmarking for Hyperparameter Optimization
Despite the recent progress in hyperparameter optimization (HPO), available benchmarks that resemble realworld scenarios consist of a few and very large problem instances that are expensive to solve. This blocks researchers and practitioners not only from systematically running largescale comparisons that are needed to draw statistically significant results but also from reproducing experiments that were conducted before. This work proposes a method to alleviate these issues by means of a metasurrogate model for HPO tasks trained on offline generated data. The model combines a probabilistic encoder with a multitask model such that it can generate inexpensive and realistic tasks of the class of problems of interest. We demonstrate that benchmarking HPO methods on samples of the generative model allows us to draw more coherent and statistically significant conclusions that can be reached orders of magnitude faster than using the original tasks. We provide evidence of our findings for various HPO methods on a wide class of problems.
05/30/2019 ∙ by Aaron Klein, et al. ∙ 0 ∙ shareread it
Javier Gonzalez
is this you? claim profile