
Towards Principled Unsupervised Learning
General unsupervised learning is a longstanding conceptual problem in machine learning. Supervised learning is successful because it can be solved by the minimization of the training error cost function. Unsupervised learning is not as successful, because the unsupervised objective may be unrelated to the supervised task of interest. For an example, density modelling and reconstruction have often been used for unsupervised learning, but they did not produced the soughtafter performance gains, because they have no knowledge of the supervised tasks. In this paper, we present an unsupervised cost function which we name the Output Distribution Matching (ODM) cost, which measures a divergence between the distribution of predictions and distributions of labels. The ODM cost is appealing because it is consistent with the supervised cost in the following sense: a perfect supervised classifier is also perfect according to the ODM cost. Therefore, by aggressively optimizing the ODM cost, we are almost guaranteed to improve our supervised performance whenever the space of possible predictions is exponentially large. We demonstrate that the ODM cost works well on number of small and semiartificial datasets using no (or almost no) labelled training cases. Finally, we show that the ODM cost can be used for oneshot domain adaptation, which allows the model to classify inputs that differ from the input distribution in significant ways without the need for prior exposure to the new domain.
11/19/2015 ∙ by Ilya Sutskever, et al. ∙ 0 ∙ shareread it

Interaction Networks for Learning about Objects, Relations and Physics
Reasoning about objects, relations, and physics is central to human intelligence, and a key goal of artificial intelligence. Here we introduce the interaction network, a model which can reason about how objects in complex systems interact, supporting dynamical predictions, as well as inferences about the abstract properties of the system. Our model takes graphs as input, performs object and relationcentric reasoning in a way that is analogous to a simulation, and is implemented using deep neural networks. We evaluate its ability to reason about several challenging physical domains: nbody problems, rigidbody collision, and nonrigid dynamics. Our results show it can be trained to accurately simulate the physical trajectories of dozens of objects over thousands of time steps, estimate abstract quantities such as energy, and generalize automatically to systems with different numbers and configurations of objects and relations. Our interaction network implementation is the first generalpurpose, learnable physics engine, and a powerful general framework for reasoning about object and relations in a wide variety of complex realworld domains.
12/01/2016 ∙ by Peter W. Battaglia, et al. ∙ 0 ∙ shareread it

Normalizing Flows on Riemannian Manifolds
We consider the problem of density estimation on Riemannian manifolds. Density estimation on manifolds has many applications in fluidmechanics, optics and plasma physics and it appears often when dealing with angular variables (such as used in protein folding, robot limbs, geneexpression) and in general directional statistics. In spite of the multitude of algorithms available for density estimation in the Euclidean spaces R^n that scale to large n (e.g. normalizing flows, kernel methods and variational approximations), most of these methods are not immediately suitable for density estimation in more general Riemannian manifolds. We revisit techniques related to homeomorphisms from differential geometry for projecting densities to submanifolds and use it to generalize the idea of normalizing flows to more general Riemannian manifolds. The resulting algorithm is scalable, simple to implement and suitable for use with automatic differentiation. We demonstrate concrete examples of this method on the nsphere S^n.
11/07/2016 ∙ by Mevlana C. Gemici, et al. ∙ 0 ∙ shareread it

Fewshot Autoregressive Density Estimation: Towards Learning to Learn Distributions
Deep autoregressive models have shown stateoftheart performance in density estimation for natural images on largescale datasets such as ImageNet. However, such models require many thousands of gradientbased weight updates and unique image examples for training. Ideally, the models would rapidly learn visual concepts from only a handful of examples, similar to the manner in which humans learns across many vision tasks. In this paper, we show how 1) neural attention and 2) meta learning techniques can be used in combination with autoregressive models to enable effective fewshot density estimation. Our proposed modifications to PixelCNN result in stateofthe art fewshot density estimation on the Omniglot dataset. Furthermore, we visualize the learned attention policy and find that it learns intuitive algorithms for simple tasks such as image mirroring on ImageNet and handwriting on Omniglot without supervision. Finally, we extend the model to natural images and demonstrate fewshot image generation on the Stanford Online Products dataset.
10/27/2017 ∙ by Scott Reed, et al. ∙ 0 ∙ shareread it

Learning and Querying Fast Generative Models for Reinforcement Learning
A key challenge in modelbased reinforcement learning (RL) is to synthesize computationally efficient and accurate environment models. We show that carefully designed generative models that learn and operate on compact state representations, socalled statespace models, substantially reduce the computational costs for predicting outcomes of sequences of actions. Extensive experiments establish that statespace models accurately capture the dynamics of Atari games from the Arcade Learning Environment from raw pixels. The computational speedup of statespace models while maintaining high accuracy makes their application in RL feasible: We demonstrate that agents which query these models for decision making outperform strong modelfree baselines on the game MSPACMAN, demonstrating the potential of using learned environment models for planning.
02/08/2018 ∙ by Lars Buesing, et al. ∙ 0 ∙ shareread it

Unsupervised Predictive Memory in a GoalDirected Agent
Animals execute goaldirected behaviours despite the limited range and scope of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement learning (RL) algorithms with deep neural networks, and the excitement surrounding these results has led to the pursuit of related ideas as explanations of nonhuman animal learning. However, we demonstrate that contemporary RL algorithms struggle to solve simple tasks when enough information is concealed from the sensors of the agent, a property called "partial observability". An obvious requirement for handling partially observed tasks is access to extensive memory, but we show memory is not enough; it is critical that the right information be stored in the right format. We develop a model, the Memory, RL, and Inference Network (MERLIN), in which memory formation is guided by a process of predictive modeling. MERLIN facilitates the solution of tasks in 3D virtual reality environments for which partial observability is severe and memories must be maintained over long durations. Our model demonstrates a single learning agent architecture that can solve canonical behavioural tasks in psychology and neurobiology without strong simplifying assumptions about the dimensionality of sensory input or the duration of experiences.
03/28/2018 ∙ by Greg Wayne, et al. ∙ 0 ∙ shareread it

Towards a Definition of Disentangled Representations
How can intelligent agents solve a diverse set of tasks in a dataefficient manner? The disentangled representation learning approach posits that such an agent would benefit from separating out (disentangling) the underlying structure of the world into disjoint parts of its representation. However, there is no generally agreedupon definition of disentangling, not least because it is unclear how to formalise the notion of world structure beyond toy datasets with a known ground truth generative process. Here we propose that a principled solution to characterising disentangled representations can be found by focusing on the transformation properties of the world. In particular, we suggest that those transformations that change only some properties of the underlying world state, while leaving all other properties invariant, are what gives exploitable structure to any kind of data. Similar ideas have already been successfully applied in physics, where the study of symmetry transformations has revolutionised the understanding of the world structure. By connecting symmetry transformations to vector representations using the formalism of group and representation theory we arrive at the first formal definition of disentangled representations. Our new definition is in agreement with many of the current intuitions about disentangling, while also providing principled resolutions to a number of previous points of contention. While this work focuses on formally defining disentangling  as opposed to solving the learning problem  we believe that the shift in perspective to studying data transformations can stimulate the development of better representation learning algorithms.
12/05/2018 ∙ by Irina Higgins, et al. ∙ 0 ∙ shareread it
Danilo Rezende
is this you? claim profile