Javier Gonzalez

is this you? claim profile


  • Deep Gaussian Processes for Multi-fidelity Modeling

    Multi-fidelity methods are prominently used when cheaply-obtained, but possibly biased and noisy, observations must be effectively combined with limited or expensive true data in order to construct reliable models. This arises in both fundamental machine learning procedures such as Bayesian optimization, as well as more practical science and engineering applications. In this paper we develop a novel multi-fidelity model which treats layers of a deep Gaussian process as fidelity levels, and uses a variational inference scheme to propagate uncertainty across them. This allows for capturing nonlinear correlations between fidelities with lower risk of overfitting than existing methods exploiting compositional structure, which are conversely burdened by structural assumptions and constraints. We show that the proposed approach makes substantial improvements in quantifying and propagating uncertainty in multi-fidelity set-ups, which in turn improves their effectiveness in decision making pipelines.

    03/18/2019 ∙ by Kurt Cutajar, et al. ∙ 4 share

    read it

  • Preferential Bayesian Optimization

    Bayesian optimization (BO) has emerged during the last few years as an effective approach to optimizing black-box functions where direct queries of the objective are expensive. In this paper we consider the case where direct access to the function is not possible, but information about user preferences is. Such scenarios arise in problems where human preferences are modeled, such as A/B tests or recommender systems. We present a new framework for this scenario that we call Preferential Bayesian Optimization (PBO) which allows us to find the optimum of a latent function that can only be queried through pairwise comparisons, the so-called duels. PBO extends the applicability of standard BO ideas and generalizes previous discrete dueling approaches by modeling the probability of the winner of each duel by means of a Gaussian process model with a Bernoulli likelihood. The latent preference function is used to define a family of acquisition functions that extend usual policies used in BO. We illustrate the benefits of PBO in a variety of experiments, showing that PBO needs drastically fewer comparisons for finding the optimum. According to our experiments, the way of modeling correlations in PBO is key in obtaining this advantage.

    04/12/2017 ∙ by Javier Gonzalez, et al. ∙ 0 share

    read it

  • Correcting boundary over-exploration deficiencies in Bayesian optimization with virtual derivative sign observations

    Bayesian optimization () is a global optimization strategy designed to find the minimum of an expensive black-box function, typically defined on a continuous subset of R^d, by using a Gaussian process () as a surrogate model for the objective. Although currently available acquisition functions address this goal with different degree of success, an over-exploration effect of the contour of the search space is typically observed. However, in problems like the configuration of machine learning algorithms, the function domain is conservatively large and with a high probability the global minimum does not sit the boundary. We propose a method to incorporate this knowledge into the searching process by adding virtual derivative observations in the at the borders of the search space. We use the properties of to impose conditions on the partial derivatives of the objective. The method is applicable with any acquisition function, it is easy to use and consistently reduces the number of evaluations required to optimize the objective irrespective of the acquisition used. We illustrate the benefits our approach in an extensive experimental comparison.

    04/04/2017 ∙ by Eero Siivola, et al. ∙ 0 share

    read it

  • Variational Auto-encoded Deep Gaussian Processes

    We develop a scalable deep non-parametric generative model by augmenting deep Gaussian processes with a recognition model. Inference is performed in a novel scalable variational framework where the variational posterior distributions are reparametrized through a multilayer perceptron. The key aspect of this reformulation is that it prevents the proliferation of variational parameters which otherwise grow linearly in proportion to the sample size. We derive a new formulation of the variational lower bound that allows us to distribute most of the computation in a way that enables to handle datasets of the size of mainstream deep learning tasks. We show the efficacy of the method on a variety of challenges including deep unsupervised learning and deep Bayesian optimization.

    11/19/2015 ∙ by Zhenwen Dai, et al. ∙ 0 share

    read it

  • GLASSES: Relieving The Myopia Of Bayesian Optimisation

    We present GLASSES: Global optimisation with Look-Ahead through Stochastic Simulation and Expected-loss Search. The majority of global optimisation approaches in use are myopic, in only considering the impact of the next function value; the non-myopic approaches that do exist are able to consider only a handful of future evaluations. Our novel algorithm, GLASSES, permits the consideration of dozens of evaluations into the future. This is done by approximating the ideal look-ahead loss function, which is expensive to evaluate, by a cheaper alternative in which the future steps of the algorithm are simulated beforehand. An Expectation Propagation algorithm is used to compute the expected value of the loss.We show that the far-horizon planning thus enabled leads to substantive performance gains in empirical tests.

    10/21/2015 ∙ by Javier Gonzalez, et al. ∙ 0 share

    read it

  • Batch Bayesian Optimization via Local Penalization

    The popularity of Bayesian optimization methods for efficient exploration of parameter spaces has lead to a series of papers applying Gaussian processes as surrogates in the optimization of functions. However, most proposed approaches only allow the exploration of the parameter space to occur sequentially. Often, it is desirable to simultaneously propose batches of parameter values to explore. This is particularly the case when large parallel processing facilities are available. These facilities could be computational or physical facets of the process being optimized. E.g. in biological experiments many experimental set ups allow several samples to be simultaneously processed. Batch methods, however, require modeling of the interaction between the evaluations in the batch, which can be expensive in complex scenarios. We investigate a simple heuristic based on an estimate of the Lipschitz constant that captures the most important aspect of this interaction (i.e. local repulsion) at negligible computational overhead. The resulting algorithm compares well, in running time, with much more elaborate alternatives. The approach assumes that the function of interest, f, is a Lipschitz continuous function. A wrap-loop around the acquisition function is used to collect batches of points of certain size minimizing the non-parallelizable computational effort. The speed-up of our method with respect to previous approaches is significant in a set of computationally expensive experiments.

    05/29/2015 ∙ by Javier Gonzalez, et al. ∙ 0 share

    read it

  • Bayesian Optimization for Synthetic Gene Design

    We address the problem of synthetic gene design using Bayesian optimization. The main issue when designing a gene is that the design space is defined in terms of long strings of characters of different lengths, which renders the optimization intractable. We propose a three-step approach to deal with this issue. First, we use a Gaussian process model to emulate the behavior of the cell. As inputs of the model, we use a set of biologically meaningful gene features, which allows us to define optimal gene designs rules. Based on the model outputs we define a multi-task acquisition function to optimize simultaneously severals aspects of interest. Finally, we define an evaluation function, which allow us to rank sets of candidate gene sequences that are coherent with the optimal design strategy. We illustrate the performance of this approach in a real gene design experiment with mammalian cells.

    05/07/2015 ∙ by Javier Gonzalez, et al. ∙ 0 share

    read it

  • Reproducing kernel Hilbert space based estimation of systems of ordinary differential equations

    Non-linear systems of differential equations have attracted the interest in fields like system biology, ecology or biochemistry, due to their flexibility and their ability to describe dynamical systems. Despite the importance of such models in many branches of science they have not been the focus of systematic statistical analysis until recently. In this work we propose a general approach to estimate the parameters of systems of differential equations measured with noise. Our methodology is based on the maximization of the penalized likelihood where the system of differential equations is used as a penalty. To do so, we use a Reproducing Kernel Hilbert Space approach that allows to formulate the estimation problem as an unconstrained numeric maximization problem easy to solve. The proposed method is tested with synthetically simulated data and it is used to estimate the unobserved transcription factor CdaR in Steptomyes coelicolor using gene expression data of the genes it regulates.

    11/14/2013 ∙ by Javier Gonzalez, et al. ∙ 0 share

    read it

  • Active Multi-Information Source Bayesian Quadrature

    Bayesian quadrature (BQ) is a sample-efficient probabilistic numerical method to solve integrals of expensive-to-evaluate black-box functions, yet so far,active BQ learning schemes focus merely on the integrand itself as information source, and do not allow for information transfer from cheaper, related functions. Here, we set the scene for active learning in BQ when multiple related information sources of variable cost (in input and source) are accessible. This setting arises for example when evaluating the integrand requires a complex simulation to be run that can be approximated by simulating at lower levels of sophistication and at lesser expense. We construct meaningful cost-sensitive multi-source acquisition rates as an extension to common utility functions from vanilla BQ (VBQ),and discuss pitfalls that arise from blindly generalizing. Furthermore, we show that the VBQ acquisition policy is a corner-case of all considered cost-sensitive acquisition schemes, which collapse onto one single de-generate policy in the case of one source and constant cost. In proof-of-concept experiments we scrutinize the behavior of our generalized acquisition functions. On an epidemiological model, we demonstrate that active multi-source BQ (AMS-BQ) allocates budget more efficiently than VBQ for learning the integral to a good accuracy.

    03/27/2019 ∙ by Alexandra Gessner, et al. ∙ 0 share

    read it

  • Automatic Discovery of Privacy-Utility Pareto Fronts

    Differential privacy is a mathematical framework for privacy-preserving data analysis. Changing the hyperparameters of a differentially private algorithm allows one to trade off privacy and utility in a principled way. Quantifying this trade-off in advance is essential to decision-makers tasked with deciding how much privacy can be provided in a particular application while keeping acceptable utility. For more complex tasks, such as training neural networks under differential privacy, the utility achieved by a given algorithm can only be measured empirically. This paper presents a Bayesian optimization methodology for efficiently characterizing the privacy-utility trade-off of any differentially private algorithm using only empirical measurements of its utility. The versatility of our method is illustrated on a number of machine learning tasks involving multiple models, optimizers, and datasets.

    05/26/2019 ∙ by Brendan Avent, et al. ∙ 0 share

    read it

  • Meta-Surrogate Benchmarking for Hyperparameter Optimization

    Despite the recent progress in hyperparameter optimization (HPO), available benchmarks that resemble real-world scenarios consist of a few and very large problem instances that are expensive to solve. This blocks researchers and practitioners not only from systematically running large-scale comparisons that are needed to draw statistically significant results but also from reproducing experiments that were conducted before. This work proposes a method to alleviate these issues by means of a meta-surrogate model for HPO tasks trained on off-line generated data. The model combines a probabilistic encoder with a multi-task model such that it can generate inexpensive and realistic tasks of the class of problems of interest. We demonstrate that benchmarking HPO methods on samples of the generative model allows us to draw more coherent and statistically significant conclusions that can be reached orders of magnitude faster than using the original tasks. We provide evidence of our findings for various HPO methods on a wide class of problems.

    05/30/2019 ∙ by Aaron Klein, et al. ∙ 0 share

    read it