
Improved Conditional VRNNs for Video Prediction
Predicting future frames for a video sequence is a challenging generative modeling task. Promising approaches include probabilistic latent variable models such as the Variational AutoEncoder. While VAEs can handle uncertainty and model multiple possible future outcomes, they have a tendency to produce blurry predictions. In this work we argue that this is a sign of underfitting. To address this issue, we propose to increase the expressiveness of the latent distributions and to use higher capacity likelihood models. Our approach relies on a hierarchy of latent variables, which defines a family of flexible prior and posterior distributions in order to better model the probability of future sequences. We validate our proposal through a series of ablation experiments and compare our approach to current stateoftheart latent variable models. Our method performs favorably under several metrics in three different datasets.
04/27/2019 ∙ by Lluis Castrejon, et al. ∙ 8 ∙ shareread it

Learning Distributed Representations from Reviews for Collaborative Filtering
Recent work has shown that collaborative filterbased recommender systems can be improved by incorporating side information, such as natural language reviews, as a way of regularizing the derived product representations. Motivated by the success of this approach, we introduce two different models of reviews and study their effect on collaborative filtering performance. While the previous stateoftheart approach is based on a latent Dirichlet allocation (LDA) model of reviews, the models we explore are neural network based: a bagofwords productofexperts model and a recurrent neural network. We demonstrate that the increased flexibility offered by the productofexperts model allowed it to achieve stateoftheart performance on the Amazon review dataset, outperforming the LDAbased approach. However, interestingly, the greater modeling power offered by the recurrent neural network appears to undermine the model's ability to act as a regularizer of the product representations.
06/18/2018 ∙ by Amjad Almahairi, et al. ∙ 6 ∙ shareread it

Blindfold Baselines for Embodied QA
We explore blindfold (questiononly) baselines for Embodied Question Answering. The EmbodiedQA task requires an agent to answer a question by intelligently navigating in a simulated environment, gathering necessary visual information only through firstperson vision before finally answering. Consequently, a blindfold baseline which ignores the environment and visual information is a degenerate solution, yet we show through our experiments on the EQAv1 dataset that a simple questiononly baseline achieves stateoftheart results on the EmbodiedQA task in all cases except when the agent is spawned extremely close to the object.
11/12/2018 ∙ by Ankesh Anand, et al. ∙ 6 ∙ shareread it

Systematic Generalization: What Is Required and Can It Be Learned?
Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task with little adaptation and (ii) intuitively appealing modular models that require background knowledge to be instantiated. We compare both types of models in how much they lend themselves to a particular form of systematic generalization. Using a synthetic VQA test, we evaluate which models are capable of reasoning about all possible object pairs after training on only a small subset of them. Our findings show that the generalization of modular models is much more systematic and that it is highly sensitive to the module layout, i.e. to how exactly the modules are connected. We furthermore investigate if modular models that generalize well could be made more endtoend by learning their layout and parametrization. We find that endtoend methods from prior work often learn a wrong layout and a spurious parametrization that do not facilitate systematic generalization. Our results suggest that, in addition to modularity, systematic generalization in language understanding may require explicit regularizers or priors.
11/30/2018 ∙ by Dzmitry Bahdanau, et al. ∙ 6 ∙ shareread it

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance
In this work, we propose a novel constituency parsing scheme. The model predicts a vector of realvalued scalars, named syntactic distances, for each split position in the input sentence. The syntactic distances specify the order in which the split points will be selected, recursively partitioning the input, in a topdown fashion. Compared to traditional shiftreduce parsing schemes, our approach is free from the potential problem of compounding errors, while being faster and easier to parallelize. Our model achieves competitive performance amongst single model, discriminative parsers in the PTB dataset and outperforms previous models in the CTB dataset.
06/11/2018 ∙ by Yikang Shen, et al. ∙ 2 ∙ shareread it

Approximate Exploration through State Abstraction
Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impractical. In this paper we study the interplay between exploration and approximation, what we call approximate exploration. We first provide results when the approximation is explicit, quantifying the performance of an exploration algorithm, MBIEEB strehl2008analysis, when combined with state aggregation. In particular, we show that this allows the agent to trade off between learning speed and quality of the policy learned. We then turn to a successful exploration scheme in practical, pseudocount based exploration bonuses bellemare2016unifying. We show that choosing a density model implicitly defines an abstraction and that the pseudocount bonus incentivizes the agent to explore using this abstraction. We find, however, that implicit exploration may result in a mismatch between the approximated value function and exploration bonus, leading to either under or overexploration.
08/29/2018 ∙ by Adrien Ali Taïga, et al. ∙ 2 ∙ shareread it

Maximum Entropy Generators for EnergyBased Models
Unsupervised learning is about capturing dependencies between variables and is driven by the contrast between the probable vs. improbable configurations of these variables, often either via a generative model that only samples probable ones or with an energy function (unnormalized logdensity) that is low for probable ones and high for improbable ones. Here, we consider learning both an energy function and an efficient approximate sampling mechanism. Whereas the discriminator in generative adversarial networks (GANs) learns to separate data and generator samples, introducing an entropy maximization regularizer on the generator can turn the interpretation of the critic into an energy function, which separates the training distribution from everything else, and thus can be used for tasks like anomaly or novelty detection. Then, we show how Markov Chain Monte Carlo can be done in the generator latent space whose samples can be mapped to data space, producing better samples. These samples are used for the negative phase gradient required to estimate the loglikelihood gradient of the data space energy function. To maximize entropy at the output of the generator, we take advantage of recently introduced neural estimators of mutual information. We find that in addition to producing a useful scoring function for anomaly detection, the resulting approach produces sharp samples while covering the modes well, leading to high Inception and Frechet scores.
01/24/2019 ∙ by Rithesh Kumar, et al. ∙ 2 ∙ shareread it

Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representationlearning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longerterm unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning.
06/24/2012 ∙ by Yoshua Bengio, et al. ∙ 0 ∙ shareread it

Generative Adversarial Networks
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax twoplayer game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.
06/10/2014 ∙ by Ian J. Goodfellow, et al. ∙ 0 ∙ shareread it

Bayesian Hypernetworks
We propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, h, is a neural network which learns to transform a simple noise distribution, p(ϵ) = N(0,I), to a distribution q(θ) q(h(ϵ)) over the parameters θ of another neural network (the "primary network"). We train q with variational inference, using an invertible h to enable efficient estimation of the variational lower bound on the posterior p(θ  D) via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap i.i.d. sampling of q(θ). We demonstrate these qualitative advantages of Bayesian hypernets, which also achieve competitive performance on a suite of tasks that demonstrate the advantage of estimating model uncertainty, including active learning and anomaly detection.
10/13/2017 ∙ by David Krueger, et al. ∙ 0 ∙ shareread it

Learnable Explicit Density for Continuous Latent Space and Variational Inference
In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its corresponding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF) and show that with further improvement, inverse AF could be used as universal approximation to any complicated posterior. Our analysis results in a unified approach to parameterizing a VAE, without the need to restrict ourselves to use factorial Gaussians in the latent real space.
10/06/2017 ∙ by ChinWei Huang, et al. ∙ 0 ∙ shareread it