
Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model
We present a novel framework that enables efficient probabilistic inference in largescale scientific models by allowing the execution of existing domainspecific simulators as probabilistic programs, resulting in highly interpretable posterior inference. Our framework is general purpose and scalable, and is based on a crossplatform probabilistic execution protocol through which an inference engine can control simulators in a languageagnostic way. We demonstrate the technique in particle physics, on a scientifically accurate simulation of the tau lepton decay, which is a key ingredient in establishing the properties of the Higgs boson. Highenergy physics has a rich set of simulators based on quantum field theory and the interaction of particles in matter. We show how to use probabilistic programming to perform Bayesian inference in these existing simulator codebases directly, in particular conditioning on observable outputs from a simulated particle detector to directly produce an interpretable posterior distribution over decay pathways. Inference efficiency is achieved via inference compilation where a deep recurrent neural network is trained to parameterize proposal distributions and control the stochastic simulator in a sequential importance sampling scheme, at a fraction of the computational cost of Markov chain Monte Carlo sampling.
07/20/2018 ∙ by Atilim Gunes Baydin, et al. ∙ 4 ∙ shareread it

A semantic networkbased evolutionary algorithm for computational creativity
We introduce a novel evolutionary algorithm (EA) with a semantic networkbased representation. For enabling this, we establish new formulations of EA variation operators, crossover and mutation, that we adapt to work on semantic networks. The algorithm employs commonsense reasoning to ensure all operations preserve the meaningfulness of the networks, using ConceptNet and WordNet knowledge bases. The algorithm can be interpreted as a novel memetic algorithm (MA), given that (1) individuals represent pieces of information that undergo evolution, as in the original sense of memetics as it was introduced by Dawkins; and (2) this is different from existing MA, where the word "memetic" has been used as a synonym for local refinement after global optimization. For evaluating the approach, we introduce an analogical similaritybased fitness measure that is computed through structure mapping. This setup enables the openended generation of networks analogous to a given base network.
04/30/2014 ∙ by Atilim Gunes Baydin, et al. ∙ 0 ∙ shareread it

Online Learning Rate Adaptation with Hypergradient Descent
We introduce a general method for improving the convergence rate of gradientbased optimizers that is easy to implement and works well in practice. We analyze the effectiveness of the method by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it improves upon these commonly used algorithms on a range of optimization problems; in particular the kinds of objective functions that arise frequently in deep neural network training. Our method works by dynamically updating the learning rate during optimization using the gradient with respect to the learning rate of the update rule itself. Computing this "hypergradient" needs little additional computation, requires only one extra copy of the original gradient to be stored in memory, and relies upon nothing more than what is provided by reversemode automatic differentiation.
03/14/2017 ∙ by Atilim Gunes Baydin, et al. ∙ 0 ∙ shareread it

Automated Generation of CrossDomain Analogies via Evolutionary Computation
Analogy plays an important role in creativity, and is extensively used in science as well as art. In this paper we introduce a technique for the automated generation of crossdomain analogies based on a novel evolutionary algorithm (EA). Unlike existing work in computational analogymaking restricted to creating analogies between two given cases, our approach, for a given case, is capable of creating an analogy along with the novel analogous case itself. Our algorithm is based on the concept of "memes", which are units of culture, or knowledge, undergoing variation and selection under a fitness measure, and represents evolving pieces of knowledge as semantic networks. Using a fitness function based on Gentner's structure mapping theory of analogies, we demonstrate the feasibility of spontaneously generating semantic networks that are analogous to a given base network.
04/11/2012 ∙ by Atilim Gunes Baydin, et al. ∙ 0 ∙ shareread it

Evolution of Ideas: A Novel Memetic Algorithm Based on Semantic Networks
This paper presents a new type of evolutionary algorithm (EA) based on the concept of "meme", where the individuals forming the population are represented by semantic networks and the fitness measure is defined as a function of the represented knowledge. Our work can be classified as a novel memetic algorithm (MA), given that (1) it is the units of culture, or information, that are undergoing variation, transmission, and selection, very close to the original sense of memetics as it was introduced by Dawkins; and (2) this is different from existing MA, where the idea of memetics has been utilized as a means of local refinement by individual learning after classical global sampling of EA. The individual pieces of information are represented as simple semantic networks that are directed graphs of concepts and binary relations, going through variation by memetic versions of operators such as crossover and mutation, which utilize knowledge from commonsense knowledge bases. In evaluating this introductory work, as an interesting fitness measure, we focus on using the structure mapping theory of analogical reasoning from psychology to evolve pieces of information that are analogous to a given base information. Considering other possible fitness measures, the proposed representation and algorithm can serve as a computational tool for modeling memetic theories of knowledge, such as evolutionary epistemology and cultural selection theory.
01/12/2012 ∙ by Atilim Gunes Baydin, et al. ∙ 0 ∙ shareread it

Tricks from Deep Learning
The deep learning community has devised a diverse set of methods to make gradient optimization, using large datasets, of large and highly complex models with deeply cascaded nonlinearities, practical. Taken as a whole, these methods constitute a breakthrough, allowing computational structures which are quite wide, very deep, and with an enormous number and variety of free parameters to be effectively optimized. The result now dominates much of practical machine learning, with applications in machine translation, computer vision, and speech recognition. Many of these methods, viewed through the lens of algorithmic differentiation (AD), can be seen as either addressing issues with the gradient itself, or finding ways of achieving increased efficiency using tricks that are ADrelated, but not provided by current AD systems. The goal of this paper is to explain not just those methods of most relevance to AD, but also the technical constraints and mindset which led to their discovery. After explaining this context, we present a "laundry list" of methods developed by the deep learning community. Two of these are discussed in further mathematical detail: a way to dramatically reduce the size of the tape when performing reversemode AD on a (theoretically) timereversible process like an ODE integrator; and a new mathematical insight that allows for the implementation of a stochastic Newton's method.
11/10/2016 ∙ by Atilim Gunes Baydin, et al. ∙ 0 ∙ shareread it

Inference Compilation and Universal Probabilistic Programming
We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do "compilation of inference" because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language. When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program. Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine. We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference.
10/31/2016 ∙ by Tuan Anh Le, et al. ∙ 0 ∙ shareread it

Using Synthetic Data to Train Neural Networks is ModelBased Reasoning
We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, modelbased reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the syntheticdata generative model. We demonstrate this connection in a recognition task where we develop a novel Captchabreaking architecture and train it using synthetic data, demonstrating both stateoftheart performance and a way of computing taskspecific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of realworld Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.
03/02/2017 ∙ by Tuan Anh Le, et al. ∙ 0 ∙ shareread it

Automatic Differentiation of Algorithms for Machine Learning
Automatic differentiationthe mechanical transformation of numeric computer programs to calculate derivatives efficiently and accuratelydates to the origin of the computer age. Reverse mode automatic differentiation both antedates and generalizes the method of backwards propagation of errors used in machine learning. Despite this, practitioners in a variety of fields, including machine learning, have been little influenced by automatic differentiation, and make scant use of available tools. Here we review the technique of automatic differentiation, describe its two main modes, and explain how it can benefit machine learning practitioners. To reach the widest possible audience our treatment assumes only elementary differential calculus, and does not assume any knowledge of linear algebra.
04/28/2014 ∙ by Atilim Gunes Baydin, et al. ∙ 0 ∙ shareread it

CBR with Commonsense Reasoning and Structure Mapping: An Application to Mediation
Mediation is an important method in dispute resolution. We implement a case based reasoning approach to mediation integrating analogical and commonsense reasoning components that allow an artificial mediation agent to satisfy requirements expected from a human mediator, in particular: utilizing experience with cases in different domains; and structurally transforming the set of issues for a better solution. We utilize a case structure based on ontologies reflecting the perceptions of the parties in dispute. The analogical reasoning component, employing the Structure Mapping Theory from psychology, provides a flexibility to respond innovatively in unusual circumstances, in contrast with conventional approaches confined into specialized problem domains. We aim to build a mediation case base incorporating real world instances ranging from interpersonal or intergroup disputes to international conflicts.
07/30/2011 ∙ by Atilim Gunes Baydin, et al. ∙ 0 ∙ shareread it

Improvements to Inference Compilation for Probabilistic Programming in LargeScale Scientific Simulators
We consider the problem of Bayesian inference in the family of probabilistic models implicitly defined by stochastic generative models of data. In scientific fields ranging from population biology to cosmology, lowlevel mechanistic components are composed to create complex generative models. These models lead to intractable likelihoods and are typically nondifferentiable, which poses challenges for traditional approaches to inference. We extend previous work in "inference compilation", which combines universal probabilistic programming and deep learning methods, to largescale scientific simulators, and introduce a C++ based probabilistic programming library called CPProb. We successfully use CPProb to interface with SHERPA, a large codebase used in particle physics. Here we describe the technical innovations realized and planned for this library.
12/21/2017 ∙ by Mario Lezcano Casado, et al. ∙ 0 ∙ shareread it