Neil D. Lawrence

is this you? claim profile


Director of Machine Learning, Amazon Research Cambridge and Professor of Machine Learning, University of Sheffield

  • Transferring Knowledge across Learning Processes

    In complex transfer learning scenarios new tasks might not be tightly linked to previous tasks. Approaches that transfer information contained only in the final parameters of a source model will therefore struggle. Instead, transfer learning at a higher level of abstraction is needed. We propose Leap, a framework that achieves this by transferring knowledge across learning processes. We associate each task with a manifold on which the training process travels from initialization to final parameters and construct a meta learning objective that minimizes the expected length of this path. Our framework leverages only information obtained during training and can be computed on the fly at negligible cost. We demonstrate that our framework outperforms competing methods, both in meta learning and transfer learning, on a set of computer vision tasks. Finally, we demonstrate that Leap can transfer knowledge across learning processes in demanding Reinforcement Learning environments (Atari) that involve millions of gradient steps.

    12/03/2018 ∙ by Sebastian Flennerhag, et al. ∙ 126 share

    read it

  • Data Science and Digital Systems: The 3Ds of Machine Learning Systems Design

    Machine learning solutions, in particular those based on deep learning methods, form an underpinning of the current revolution in "artificial intelligence" that has dominated popular press headlines and is having a significant influence on the wider tech agenda. Here we give an overview of the 3Ds of ML systems design: Data, Design and Deployment. By considering the 3Ds we can move towards data first design.

    03/26/2019 ∙ by Neil D. Lawrence, et al. ∙ 20 share

    read it

  • Variational Information Distillation for Knowledge Transfer

    Transferring knowledge from a teacher neural network pretrained on the same or a similar task to a student neural network can significantly improve the performance of the student neural network. Existing knowledge transfer approaches match the activations or the corresponding hand-crafted features of the teacher and the student networks. We propose an information-theoretic framework for knowledge transfer which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks. We compare our method with existing knowledge transfer methods on both knowledge distillation and transfer learning tasks and show that our method consistently outperforms existing methods. We further demonstrate the strength of our method on knowledge transfer across heterogeneous network architectures by transferring knowledge from a convolutional neural network (CNN) to a multi-layer perceptron (MLP) on CIFAR-10. The resulting MLP significantly outperforms the-state-of-the-art methods and it achieves similar performance to the CNN with a single convolutional layer.

    04/11/2019 ∙ by Sungsoo Ahn, et al. ∙ 6 share

    read it

  • Gaussian Process Latent Force Models for Learning and Stochastic Control of Physical Systems

    This paper is concerned with estimation and stochastic control in physical systems which contain unknown input signals or forces. These unknown signals are modeled as Gaussian processes (GP) in the sense that GP models are used in machine learning. The resulting latent force models (LFMs) can be seen as hybrid models that contain a first-principles physical model part and a non-parametric GP model part. The aim of this paper is to collect and extend the statistical inference and learning methods for this kind of models, provide new theoretical results for the models, and to extend the methodology and theory to stochastic control of LFMs.

    09/15/2017 ∙ by Simo Särkkä, et al. ∙ 0 share

    read it

  • Living Together: Mind and Machine Intelligence

    In this paper we consider the nature of the machine intelligences we have created in the context of our human intelligence. We suggest that the fundamental difference between human and machine intelligence comes down to embodiment factors. We define embodiment factors as the ratio between an entity's ability to communicate information vs compute information. We speculate on the role of embodiment factors in driving our own intelligence and consciousness. We briefly review dual process models of cognition and cast machine intelligence within that framework, characterising it as a dominant System Zero, which can drive behaviour through interfacing with us subconsciously. Driven by concerns about the consequence of such a system we suggest prophylactic courses of action that could be considered. Our main conclusion is that it is not sentient intelligence we should fear but non-sentient intelligence.

    05/22/2017 ∙ by Neil D. Lawrence, et al. ∙ 0 share

    read it

  • Data Readiness Levels

    Application of models to data is fraught. Data-generating collaborators often only have a very basic understanding of the complications of collating, processing and curating data. Challenges include: poor data collection practices, missing values, inconvenient storage mechanisms, intellectual property, security and privacy. All these aspects obstruct the sharing and interconnection of data, and the eventual interpretation of data through machine learning or other approaches. In project reporting, a major challenge is in encapsulating these problems and enabling goals to be built around the processing of data. Project overruns can occur due to failure to account for the amount of time required to curate and collate. But to understand these failures we need to have a common language for assessing the readiness of a particular data set. This position paper proposes the use of data readiness levels: it gives a rough outline of three stages of data preparedness and speculates on how formalisation of these levels into a common language for data readiness could facilitate project management.

    05/05/2017 ∙ by Neil D. Lawrence, et al. ∙ 0 share

    read it

  • Efficient Modeling of Latent Information in Supervised Learning using Gaussian Processes

    Often in machine learning, data are collected as a combination of multiple conditions, e.g., the voice recordings of multiple persons, each labeled with an ID. How could we build a model that captures the latent information related to these conditions and generalize to a new one with few data? We present a new model called Latent Variable Multiple Output Gaussian Processes (LVMOGP) and that allows to jointly model multiple conditions for regression and generalize to a new condition with a few data points at test time. LVMOGP infers the posteriors of Gaussian processes together with a latent space representing the information about different conditions. We derive an efficient variational inference method for LVMOGP, of which the computational complexity is as low as sparse Gaussian processes. We show that LVMOGP significantly outperforms related Gaussian process methods on various tasks with both synthetic and real data.

    05/27/2017 ∙ by Zhenwen Dai, et al. ∙ 0 share

    read it

  • Preferential Bayesian Optimization

    Bayesian optimization (BO) has emerged during the last few years as an effective approach to optimizing black-box functions where direct queries of the objective are expensive. In this paper we consider the case where direct access to the function is not possible, but information about user preferences is. Such scenarios arise in problems where human preferences are modeled, such as A/B tests or recommender systems. We present a new framework for this scenario that we call Preferential Bayesian Optimization (PBO) which allows us to find the optimum of a latent function that can only be queried through pairwise comparisons, the so-called duels. PBO extends the applicability of standard BO ideas and generalizes previous discrete dueling approaches by modeling the probability of the winner of each duel by means of a Gaussian process model with a Bernoulli likelihood. The latent preference function is used to define a family of acquisition functions that extend usual policies used in BO. We illustrate the benefits of PBO in a variety of experiments, showing that PBO needs drastically fewer comparisons for finding the optimum. According to our experiments, the way of modeling correlations in PBO is key in obtaining this advantage.

    04/12/2017 ∙ by Javier Gonzalez, et al. ∙ 0 share

    read it

  • Manifold Alignment Determination: finding correspondences across different data views

    We present Manifold Alignment Determination (MAD), an algorithm for learning alignments between data points from multiple views or modalities. The approach is capable of learning correspondences between views as well as correspondences between individual data-points. The proposed method requires only a few aligned examples from which it is capable to recover a global alignment through a probabilistic model. The strong, yet flexible regularization provided by the generative model is sufficient to align the views. We provide experiments on both synthetic and real data to highlight the benefit of the proposed approach.

    01/12/2017 ∙ by Andreas Damianou, et al. ∙ 0 share

    read it

  • Differentially Private Gaussian Processes

    A major challenge for machine learning is increasing the availability of data while respecting the privacy of individuals. Here we combine the provable privacy guarantees of the Differential Privacy framework with the flexibility of Gaussian processes (GPs). We propose a method using GPs to provide Differentially Private (DP) regression. We then improve this method by crafting the DP noise covariance structure to efficiently protect the training data, while minimising the scale of the added noise. We find that, for the dataset used, this cloaking method achieves the greatest accuracy, while still providing privacy guarantees, and offers practical DP for regression over multi-dimensional inputs. Together these methods provide a starter toolkit for combining differential privacy and GPs.

    06/02/2016 ∙ by Michael Thomas Smith, et al. ∙ 0 share

    read it

  • Chained Gaussian Processes

    Gaussian process models are flexible, Bayesian non-parametric approaches to regression. Properties of multivariate Gaussians mean that they can be combined linearly in the manner of additive models and via a link function (like in generalized linear models) to handle non-Gaussian data. However, the link function formalism is restrictive, link functions are always invertible and must convert a parameter of interest to a linear combination of the underlying processes. There are many likelihoods and models where a non-linear combination is more appropriate. We term these more general models Chained Gaussian Processes: the transformation of the GPs to the likelihood parameters will not generally be invertible, and that implies that linearisation would only be possible with multiple (localized) links, i.e. a chain. We develop an approximate inference procedure for Chained GPs that is scalable and applicable to any factorized likelihood. We demonstrate the approximation on a range of likelihood functions.

    04/18/2016 ∙ by Alan D. Saul, et al. ∙ 0 share

    read it