
MetaLearning surrogate models for sequential decision making
Metalearning methods leverage past experience to learn datadriven inductive biases from related problems, increasing learning efficiency on new tasks. This ability renders them particularly suitable for sequential decision making with limited experience. Within this problem family, we argue for the use of such approaches in the study of modelbased approaches to Bayesian Optimisation, contextual bandits and Reinforcement Learning. We approach the problem by learning distributions over functions using Neural Processes (NPs), a recently introduced probabilistic metalearning method. This allows the treatment of model uncertainty to tackle the exploration/exploitation dilemma. We show that NPs are suitable for sequential decision making on a diverse set of domains, including adversarial task search, recommender systems and modelbased reinforcement learning.
03/28/2019 ∙ by Alexandre Galashov, et al. ∙ 22 ∙ shareread it

Structured agents for physical construction
Physical construction  the ability to compose objects, subject to physical dynamics, in order to serve some function  is fundamental to human intelligence. Here we introduce a suite of challenging physical construction tasks inspired by how children play with blocks, such as matching a target configuration, stacking and attaching blocks to connect objects together, and creating shelterlike structures over target objects. We then examine how a range of modern deep reinforcement learning agents fare on these challenges, and introduce several new approaches which provide superior performance. Our results show that agents which use structured representations (e.g., objects and scene graphs) and structured policies (e.g., objectcentric actions) outperform those which use less structured representations, and generalize better beyond their training when asked to reason about larger scenes. Agents which use modelbased planning via MonteCarlo Tree Search also outperform strictly modelfree agents in our most challenging construction problems. We conclude that approaches which combine structured representations and reasoning with powerful learning are a key path toward agents that possess rich intuitive physics, scene understanding, and planning.
04/05/2019 ∙ by Victor Bapst, et al. ∙ 20 ∙ shareread it

Learning to Follow Language Instructions with Adversarial Reward Induction
Recent work has shown that deep reinforcementlearning agents can learn to follow languagelike instructions from infrequent environment rewards. However, for many realworld natural language commands that involve a degree of underspecification or ambiguity, such as "tidy the room", it would be challenging or impossible to program an appropriate reward function. To overcome this, we present a method for learning to follow commands from a training set of instructions and corresponding example goalstates, rather than an explicit reward function. Importantly, the example goalstates are not seen at test time. The approach effectively separates the representation of what instructions require from how they can be executed. In a simple grid world, the method enables an agent to learn a range of commands requiring interaction with blocks and understanding of spatial relations and underspecified abstract arrangements. We further show the method allows our agent to adapt to changes in the environment without requiring new training examples.
06/05/2018 ∙ by Dzmitry Bahdanau, et al. ∙ 14 ∙ shareread it

Degenerate Feedback Loops in Recommender Systems
Machine learning is used extensively in recommender systems deployed in products. The decisions made by these systems can influence user beliefs and preferences which in turn affect the feedback the learning system receives  thus creating a feedback loop. This phenomenon can give rise to the socalled "echo chambers" or "filter bubbles" that have user and societal implications. In this paper, we provide a novel theoretical analysis that examines both the role of user dynamics and the behavior of recommender systems, disentangling the echo chamber from the filter bubble effect. In addition, we offer practical solutions to slow down system degeneracy. Our study contributes toward understanding and developing solutions to commonly cited issues in the complex temporal scenario, an area that is still largely unexplored.
02/27/2019 ∙ by Ray Jiang, et al. ∙ 14 ∙ shareread it

Verification of NonLinear Specifications for Neural Networks
Prior work on neural network verification has focused on specifications that are linear functions of the output of the network, e.g., invariance of the classifier output under adversarial perturbations of the input. In this paper, we extend verification algorithms to be able to certify richer properties of neural networks. To do this we introduce the class of convexrelaxable specifications, which constitute nonlinear specifications that can be verified using a convex relaxation. We show that a number of important properties of interest can be modeled within this class, including conservation of energy in a learned dynamics model of a physical system; semantic consistency of a classifier's output labels under adversarial perturbations and bounding errors in a system that predicts the summation of handwritten digits. Our experimental evaluation shows that our method is able to effectively verify these specifications. Moreover, our evaluation exposes the failure modes in models which cannot be verified to satisfy these specifications. Thus, emphasizing the importance of training models not just to fit training data but also to be consistent with specifications.
02/25/2019 ∙ by Chongli Qin, et al. ∙ 14 ∙ shareread it

Analysing Mathematical Reasoning Abilities of Neural Models
Mathematical reasoninga core ability within human intelligencepresents some unique challenges as a domain: we do not come to understand and solve mathematical problems primarily on the back of experience and evidence, but on the basis of inferring, learning, and exploiting laws, axioms, and symbol manipulation rules. In this paper, we present a new challenge for the evaluation (and eventually the design) of neural architectures and similar system, developing a task suite of mathematics problems involving sequential questions and answers in a freeform textual input/output format. The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failuremodes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes. Having described the data generation process and its potential future expansions, we conduct a comprehensive analysis of models from two broad classes of the most powerful sequencetosequence architectures and find notable differences in their ability to resolve mathematical problems and generalize their knowledge.
04/02/2019 ∙ by David Saxton, et al. ∙ 14 ∙ shareread it

REGAL: Transfer Learning For Fast Optimization of Computation Graphs
We present a deep reinforcement learning approach to optimizing the execution cost of computation graphs in a static compiler. The key idea is to combine a neural network policy with a genetic algorithm, the Biased RandomKey Genetic Algorithm (BRKGA). The policy is trained to predict, given an input graph to be optimized, the nodelevel probability distributions for sampling mutations and crossovers in BRKGA. Our approach, "REINFORCEbased Genetic Algorithm Learning" (REGAL), uses the policy's ability to transfer to new graphs to significantly improve the solution quality of the genetic algorithm for the same objective evaluation budget. As a concrete application, we show results for minimizing peak memory in TensorFlow graphs by jointly optimizing device placement and scheduling. REGAL achieves on average 3.56 previously unseen graphs, outperforming all the algorithms we compare to, and giving 4.4x bigger improvement than the next best algorithm. We also evaluate REGAL on a production compiler team's performance benchmark of XLA graphs and achieve on average 3.74 others. Our approach and analysis is made possible by collecting a dataset of 372 unique realworld TensorFlow graphs, more than an order of magnitude more data than previous work.
05/07/2019 ∙ by Aditya Paliwal, et al. ∙ 14 ∙ shareread it

The NeuroSymbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
We propose the NeuroSymbolic Concept Learner (NSCL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our model learns by simply looking at images and reading paired questions and answers. Our model builds an objectbased scene representation and translates sentences into executable, symbolic programs. To bridge the learning of two modules, we use a neurosymbolic reasoning module that executes these programs on the latent scene representation. Analogical to human concept learning, the perception module learns visual concepts based on the language description of the object being referred to. Meanwhile, the learned visual concepts facilitate learning new words and parsing new sentences. We use curriculum learning to guide the searching over the large compositional space of images and language. Extensive experiments demonstrate the accuracy and efficiency of our model on learning visual concepts, word representations, and semantic parsing of sentences. Further, our method allows easy generalization to new object attributes, compositions, language concepts, scenes and questions, and even new program domains. It also empowers applications including visual question answering and bidirectional imagetext retrieval.
04/26/2019 ∙ by Jiayuan Mao, et al. ∙ 12 ∙ shareread it

Graph Matching Networks for Learning the Similarity of Graph Structured Objects
This paper addresses the challenging problem of retrieval and matching of graph structured objects, and makes two key contributions. First, we demonstrate how Graph Neural Networks (GNN), which have emerged as an effective model for various supervised prediction problems defined on structured data, can be trained to produce embedding of graphs in vector spaces that enables efficient similarity reasoning. Second, we propose a novel Graph Matching Network model that, given a pair of graphs as input, computes a similarity score between them by jointly reasoning on the pair through a new crossgraph attentionbased matching mechanism. We demonstrate the effectiveness of our models on different domains including the challenging problem of controlflowgraph based function similarity search that plays an important role in the detection of vulnerabilities in software systems. The experimental analysis demonstrates that our models are not only able to exploit structure in the context of similarity learning but they can also outperform domainspecific baseline systems that have been carefully handengineered for these problems.
04/29/2019 ∙ by Yujia Li, et al. ∙ 8 ∙ shareread it

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures
This paper addresses the problem of evaluating learning systems in safety critical domains such as autonomous driving, where failures can have catastrophic consequences. We focus on two problems: searching for scenarios when learned agents fail and assessing their probability of failure. The standard method for agent evaluation in reinforcement learning, Vanilla Monte Carlo, can miss failures entirely, leading to the deployment of unsafe agents. We demonstrate this is an issue for current agents, where even matching the compute used for training is sometimes insufficient for evaluation. To address this shortcoming, we draw upon the rare event probability estimation literature and propose an adversarial evaluation approach. Our approach focuses evaluation on adversarially chosen situations, while still providing unbiased estimates of failure probabilities. The key difficulty is in identifying these adversarial situations  since failures are rare there is little signal to drive optimization. To solve this we propose a continuation approach that learns failure modes in related but less robust agents. Our approach also allows reuse of data already collected for training the agent. We demonstrate the efficacy of adversarial evaluation on two standard domains: humanoid control and simulated driving. Experimental results show that our methods can find catastrophic failures and estimate failures rates of agents multiple orders of magnitude faster than standard evaluation schemes, in minutes to hours rather than days.
12/04/2018 ∙ by Jonathan Uesato, et al. ∙ 6 ∙ shareread it

Scaling shared model governance via model splitting
Currently the only techniques for sharing governance of a deep learning model are homomorphic encryption and secure multiparty computation. Unfortunately, neither of these techniques is applicable to the training of large neural networks due to their large computational and communication overheads. As a scalable technique for shared model governance, we propose splitting deep learning model between multiple parties. This paper empirically investigates the security guarantee of this technique, which is introduced as the problem of model completion: Given the entire training data set or an environment simulator, and a subset of the parameters of a trained deep learning model, how much training is required to recover the model's original performance? We define a metric for evaluating the hardness of the model completion problem and study it empirically in both supervised learning on ImageNet and reinforcement learning on Atari and DeepMind Lab. Our experiments show that (1) the model completion problem is harder in reinforcement learning than in supervised learning because of the unavailability of the trained agent's trajectories, and (2) its hardness depends not primarily on the number of parameters of the missing part, but more so on their type and location. Our results suggest that model splitting might be a feasible technique for shared model governance in some settings where training is very expensive.
12/14/2018 ∙ by Miljan Martic, et al. ∙ 6 ∙ shareread it