
The Hanabi Challenge: A New Frontier for AI Research
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains have driven research by providing sophisticated yet welldefined challenges for artificial intelligence practitioners. We continue this tradition by proposing the game of Hanabi as a new challenge domain with novel problems that arise from its combination of purely cooperative gameplay and imperfect information in a two to five player setting. In particular, we argue that Hanabi elevates reasoning about the beliefs and intentions of other agents to the foreground. We believe developing novel techniques capable of imbuing artificial agents with such theory of mind will not only be crucial for their success in Hanabi, but also in broader collaborative efforts, and especially those with human partners. To facilitate future research, we introduce the opensource Hanabi Learning Environment, propose an experimental framework for the research community to evaluate algorithmic advances, and assess the performance of current stateoftheart techniques.
02/01/2019 ∙ by Nolan Bard, et al. ∙ 54 ∙ shareread it

MetaDataset: A Dataset of Datasets for Learning to Learn from Few Examples
Fewshot classification refers to learning a classifier for new classes given only a few examples. While a plethora of models have emerged to tackle this recently, we find the current procedure and datasets that are used to systematically assess progress in this setting lacking. To address this, we propose MetaDataset: a new benchmark for training and evaluating fewshot classifiers that is largescale, consists of multiple datasets, and presents more natural and realistic tasks. The aim is to measure the ability of stateoftheart models to leverage diverse sources of data to achieve higher generalization, and to evaluate that generalization ability in a more challenging setting. We additionally measure robustness of current methods to variations in the number of available examples and the number of classes. Finally our extensive empirical evaluation leads us to identify weaknesses in Prototypical Networks and MAML, two popular fewshot classification methods, and to propose a new method, ProtoMAML, which achieves improved performance on our benchmark.
03/07/2019 ∙ by Eleni Triantafillou, et al. ∙ 42 ∙ shareread it

InfoBot: Transfer and Exploration via the Information Bottleneck
A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed. We postulate that in the absence of useful reward signals, an effective exploration strategy should seek out decision states. These states lie at critical junctions in the state space from where the agent can transition to new, potentially unexplored regions. We propose to learn about decision states from prior experience. By training a goalconditioned policy with an information bottleneck, we can identify decision states by examining where the model actually leverages the goal state. We find that this simple mechanism effectively identifies decision states, even in partially observed settings. In effect, the model learns the sensory cues that correlate with potential subgoals. In new environments, this model can then identify novel subgoals for further exploration, guiding the agent through a sequence of potential decision states and through new regions of the state space.
01/30/2019 ∙ by Anirudh Goyal, et al. ∙ 18 ∙ shareread it

Blindfold Baselines for Embodied QA
We explore blindfold (questiononly) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question by intelligently navigating in a simulated environment, gathering necessary visual information only through firstperson vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate solution, yet we show through our experiments on the EQAv1 dataset that a simple questiononly baseline achieves stateoftheart results on the EmbodiedQA task in all cases except when the agent is spawned extremely close to the object.
11/12/2018 ∙ by Ankesh Anand, et al. ∙ 6 ∙ shareread it

Hyperbolic Discounting and Learning over Multiple Horizons
Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process. The discount factor values future rewards by an exponential scheme that leads to theoretical convergence guarantees of the Bellman equation. However, evidence from psychology, economics and neuroscience suggests that humans and animals instead have hyperbolic timepreferences. In this work we revisit the fundamentals of discounting in RL and bridge this disconnect by implementing an RL agent that acts via hyperbolic discounting. We demonstrate that a simple approach approximates hyperbolic discount functions while still using familiar temporaldifference learning techniques in RL. Additionally, and independent of hyperbolic discounting, we make a surprising discovery that simultaneously learning value functions over multiple timehorizons is an effective auxiliary task which often improves over a strong valuebased RL agent, Rainbow.
02/19/2019 ∙ by William Fedus, et al. ∙ 4 ∙ shareread it

A RAD approach to deep mixture models
Flow based models such as Real NVP are an extremely powerful approach to density estimation. However, existing flow based models are restricted to transforming continuous densities over a continuous input space into similarly continuous distributions over continuous latent variables. This makes them poorly suited for modeling and representing discrete structures in data distributions, for example class membership or discrete symmetries. To address this difficulty, we present a normalizing flow architecture which relies on domain partitioning using locally invertible functions, and possesses both real and discrete valued latent variables. This Real and Discrete (RAD) approach retains the desirable normalizing flow properties of exact sampling, exact inference, and analytically computable probabilities, while at the same time allowing simultaneous modeling of both continuous and discrete structure in a data distribution.
03/18/2019 ∙ by Laurent Dinh, et al. ∙ 4 ∙ shareread it

Autotagging music with conditional restricted Boltzmann machines
This paper describes two applications of conditional restricted Boltzmann machines (CRBMs) to the task of autotagging music. The first consists of training a CRBM to predict tags that a user would apply to a clip of a song based on tags already applied by other users. By learning the relationships between tags, this model is able to preprocess training data to significantly improve the performance of a support vector machine (SVM) autotagging. The second is the use of a discriminative RBM, a type of CRBM, to autotag music. By simultaneously exploiting the relationships among tags and between tags and audiobased features, this model is able to significantly outperform SVMs, logistic regression, and multilayer perceptrons. In order to be applied to this problem, the discriminative RBM was generalized to the multilabel setting and four different learning algorithms for it were evaluated, the first such indepth analysis of which we are aware.
03/15/2011 ∙ by Michael Mandel, et al. ∙ 0 ∙ shareread it

An Infinite Restricted Boltzmann Machine
We present a mathematical construction for the restricted Boltzmann machine (RBM) that doesn't require specifying the number of hidden units. In fact, the hidden layer size is adaptive and can grow during training. This is obtained by first extending the RBM to be sensitive to the ordering of its hidden units. Then, thanks to a carefully chosen definition of the energy function, we show that the limit of infinitely many hidden units is well defined. As with RBM, approximate maximum likelihood training can be performed, resulting in an algorithm that naturally and adaptively adds trained hidden units during learning. We empirically study the behaviour of this infinite RBM, showing that its performance is competitive to that of the RBM, while not requiring the tuning of a hidden layer size.
02/09/2015 ∙ by MarcAlexandre Côté, et al. ∙ 0 ∙ shareread it

Deep learning trends for focal brain pathology segmentation in MRI
Segmentation of focal (localized) brain pathologies such as brain tumors and brain lesions caused by multiple sclerosis and ischemic strokes are necessary for medical diagnosis, surgical planning and disease development as well as other applications such as tractography. Over the years, attempts have been made to automate this process for both clinical and research reasons. In this regard, machine learning methods have long been a focus of attention. Over the past two years, the medical imaging field has seen a rise in the use of a particular branch of machine learning commonly known as deep learning. In the nonmedical computer vision world, deep learning based methods have obtained stateoftheart results on many datasets. Recent studies in computer aided diagnostics have shown deep learning methods (and especially convolutional neural networks  CNN) to yield promising results. In this chapter, we provide a survey of CNN methods applied to medical imaging with a focus on brain pathology segmentation. In particular, we discuss their characteristic peculiarities and their specific configuration and adjustments that are best suited to segment medical images. We also underline the intrinsic differences deep learning methods have with other machine learning methods.
07/18/2016 ∙ by Mohammad Havaei, et al. ∙ 0 ∙ shareread it

Neural Autoregressive Distribution Estimation
We present Neural Autoregressive Distribution Estimation (NADE) models, which are neural network architectures applied to the problem of unsupervised distribution and density estimation. They leverage the probability product rule and a weight sharing scheme inspired from restricted Boltzmann machines, to yield an estimator that is both tractable and has good generalization performance. We discuss how they achieve competitive performance in modeling both binary and realvalued observations. We also present how deep NADE models can be trained to be agnostic to the ordering of input dimensions used by the autoregressive product rule decomposition. Finally, we also show how to exploit the topological structure of pixels in images using a deep convolutional architecture for NADE.
05/07/2016 ∙ by Benigno Uria, et al. ∙ 0 ∙ shareread it

Practical Bayesian Optimization of Machine Learning Algorithms
Machine learning algorithms frequently require careful tuning of model hyperparameters, regularization terms, and optimization parameters. Unfortunately, this tuning is often a "black art" that requires expert experience, unwritten rules of thumb, or sometimes bruteforce search. Much more appealing is the idea of developing automatic approaches which can optimize the performance of a given learning algorithm to the task at hand. In this work, we consider the automatic tuning problem within the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). The tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next. Here we show how the effects of the Gaussian process prior and the associated inference procedure can have a large impact on the success or failure of Bayesian optimization. We show that thoughtful choices can lead to results that exceed expertlevel performance in tuning machine learning algorithms. We also describe new algorithms that take into account the variable cost (duration) of learning experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expertlevel optimization on a diverse set of contemporary algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.
06/13/2012 ∙ by Jasper Snoek, et al. ∙ 0 ∙ shareread it
Hugo Larochelle
is this you? claim profile
Associate Director  Learning in Machines and Brains Program at Canadian Institute for Advanced Research, Adjunct Professor at Université de Sherbrooke, Adjunct Professor at Université de Montréal, Research Scientist at Google