
MetaAmortized Variational Inference and Learning
How can we learn to do probabilistic inference in a way that generalizes between models? Amortized variational inference learns for a single model, sharing statistical strength across observations. This benefits scalability and model learning, but does not help with generalization to new models. We propose metaamortized variational inference, a framework that amortizes the cost of inference over a family of generative models. We apply this approach to deep generative models by introducing the MetaVAE: a variational autoencoder that learns to generalize to new distributions and rapidly solve new unsupervised learning problems using only a small number of target examples. Empirically, we validate the approach by showing that the MetaVAE can: (1) capture relevant sufficient statistics for inference, (2) learn useful representations of data for downstream tasks such as clustering, and (3) perform metadensity estimation on unseen synthetic distributions and outofsample Omniglot alphabets.
02/05/2019 ∙ by Kristy Choi, et al. ∙ 26 ∙ shareread it

Training Variational Autoencoders with Buffered Stochastic Variational Inference
The recognition network in deep latent variable models such as variational autoencoders (VAEs) relies on amortized inference for efficient posterior approximation that can scale up to large datasets. However, this technique has also been demonstrated to select suboptimal variational parameters, often resulting in considerable additional error called the amortization gap. To close the amortization gap and improve the training of the generative model, recent works have introduced an additional refinement step that applies stochastic variational inference (SVI) to improve upon the variational parameters returned by the amortized inference model. In this paper, we propose the Buffered Stochastic Variational Inference (BSVI), a new refinement procedure that makes use of SVI's sequence of intermediate variational proposal distributions and their corresponding importance weights to construct a new generalized importanceweighted lower bound. We demonstrate empirically that training the variational autoencoders with BSVI consistently outperforms SVI, yielding an improved training procedure for VAEs.
02/27/2019 ∙ by Rui Shu, et al. ∙ 18 ∙ shareread it

Mapping Missing Population in Rural India: A Deep Learning Approach with Satellite Imagery
Millions of people worldwide are absent from their country's census. Accurate, current, and granular population metrics are critical to improving government allocation of resources, to measuring disease control, to responding to natural disasters, and to studying any aspect of human life in these communities. Satellite imagery can provide sufficient information to build a population map without the cost and time of a government census. We present two Convolutional Neural Network (CNN) architectures which efficiently and effectively combine satellite imagery inputs from multiple sources to accurately predict the population density of a region. In this paper, we use satellite imagery from rural villages in India and population labels from the 2011 SECC census. Our best model achieves better performance than previous papers as well as LandScan, a community standard for global population distribution.
05/04/2019 ∙ by Wenjie Hu, et al. ∙ 14 ∙ shareread it

Learning Neural PDE Solvers with Convergence Guarantees
Partial differential equations (PDEs) are widely used across the physical and computational sciences. Decades of research and engineering went into designing fast iterative solution methods. Existing solvers are general purpose, but may be suboptimal for specific classes of problems. In contrast to existing handcrafted solutions, we propose an approach to learn a fast iterative solver tailored to a specific domain. We achieve this goal by learning to modify the updates of an existing solver using a deep neural network. Crucially, our approach is proven to preserve strong correctness and convergence guarantees. After training on a single geometry, our model generalizes to a wide variety of geometries and boundary conditions, and achieves 23 times speedup compared to stateoftheart solvers.
06/04/2019 ∙ by JunTing Hsieh, et al. ∙ 13 ∙ shareread it

Amortized Inference Regularization
The variational autoencoder (VAE) is a popular model for density estimation and representation learning. Canonically, the variational principle suggests to prefer an expressive inference model so that the variational approximation is accurate. However, it is often overlooked that an overlyexpressive inference model can be detrimental to the test set performance of both the amortized posterior approximator and, more importantly, the generative density estimator. In this paper, we leverage the fact that VAEs rely on amortized inference and propose techniques for amortized inference regularization (AIR) that control the smoothness of the inference model. We demonstrate that, by applying AIR, it is possible to improve VAE generalization on both inference and generative performance. Our paper challenges the belief that amortized inference is simply a mechanism for approximating maximum likelihood training and illustrates that regularization of the amortization family provides a new direction for understanding and improving generalization in VAEs.
05/23/2018 ∙ by Rui Shu, et al. ∙ 12 ∙ shareread it

Learning Controllable Fair Representations
Learning data representations that are transferable and fair with respect to certain protected attributes is crucial to reducing unfair decisions made downstream, while preserving the utility of the data. We propose an informationtheoretically motivated objective for learning maximally expressive representations subject to fairness constraints. We demonstrate that a range of existing approaches optimize approximations to the Lagrangian dual of our objective. In contrast to these existing approaches, our objective provides the user control over the fairness of representations by specifying limits on unfairness. We introduce a dual optimization method that optimizes the model as well as the expressivenessfairness tradeoff. Empirical evidence suggests that our proposed method can account for multiple notions of fairness and achieves higher expressiveness at a lower computational cost.
12/11/2018 ∙ by Jiaming Song, et al. ∙ 12 ∙ shareread it

Uncertainty Autoencoders: Learning Compressed Representations via Variational Information Maximization
The goal of statistical compressive sensing is to efficiently acquire and reconstruct highdimensional signals with much fewer measurements than the data dimensionality, given access to a finite set of training signals. Current approaches do not learn the acquisition and recovery procedures endtoend and are typically handcrafted for sparsity based priors. We propose Uncertainty Autoencoders, a framework that jointly learns the acquisition (i.e., encoding) and recovery (i.e., decoding) procedures while implicitly modeling domain structure. Our learning objective optimizes for a variational lower bound to the mutual information between the signal and the measurements. We show how our framework provides a unified treatment to several lines of research in dimensionality reduction, compressive sensing, and generative modeling. Empirically, we demonstrate improvements of 32 approaches for statistical compressive sensing of highdimensional datasets.
12/26/2018 ∙ by Aditya Grover, et al. ∙ 10 ∙ shareread it

Generative Adversarial Examples
Adversarial examples are typically constructed by perturbing an existing data point, and current defense methods are focused on guarding against this type of attack. In this paper, we propose a new class of adversarial examples that are synthesized entirely from scratch using a conditional generative model. We first train an Auxiliary Classifier Generative Adversarial Network (ACGAN) to model the classconditional distribution over inputs. Then, conditioned on a desired class, we search over the ACGAN latent space to find images that are likely under the generative model and are misclassified by a target classifier. We demonstrate through human evaluation that this new kind of adversarial inputs, which we call Generative Adversarial Examples, are legitimate and belong to the desired class. Our empirical results on the MNIST, SVHN, and CelebA datasets show that generative adversarial examples can easily bypass strong adversarial training and certified defense methods which can foil existing adversarial attacks.
05/21/2018 ∙ by Yang Song, et al. ∙ 8 ∙ shareread it

Stochastic Optimization of Sorting Networks via Continuous Relaxations
Sorting input objects is an important step in many machine learning pipelines. However, the sorting operator is nondifferentiable with respect to its inputs, which prohibits endtoend gradientbased optimization. In this work, we propose NeuralSort, a generalpurpose continuous relaxation of the output of the sorting operator from permutation matrices to the set of unimodal rowstochastic matrices, where every row sums to one and has a distinct arg max. This relaxation permits straightthrough optimization of any computational graph involve a sorting operation. Further, we use this relaxation to enable gradientbased stochastic optimization over the combinatorially large space of permutations by deriving a reparameterized gradient estimator for the PlackettLuce family of distributions over permutations. We demonstrate the usefulness of our framework on three tasks that require learning semantic orderings of highdimensional objects, including a fully differentiable, parameterized extension of the knearest neighbors algorithm.
03/21/2019 ∙ by Aditya Grover, et al. ∙ 8 ∙ shareread it

Learning to Interpret Satellite Images in Global Scale Using Wikipedia
Despite recent progress in computer vision, finegrained interpretation of satellite images remains challenging because of a lack of labeled training data. To overcome this limitation, we construct a novel dataset called WikiSatNet by pairing georeferenced Wikipedia articles with satellite imagery of their corresponding locations. We then propose two strategies to learn representations of satellite images by predicting properties of the corresponding articles from the images. Leveraging this new multimodal dataset, we can drastically reduce the quantity of humanannotated labels and time required for downstream tasks. On the recently released fMoW dataset, our pretraining strategies can boost the performance of a model pretrained on ImageNet by up to 4:5
05/07/2019 ∙ by Burak Uzkent, et al. ∙ 8 ∙ shareread it

Differentiable Antithetic Sampling for Variance Reduction in Stochastic Variational Inference
Stochastic optimization techniques are standard in variational inference algorithms. These methods estimate gradients by approximating expectations with independent Monte Carlo samples. In this paper, we explore a technique that uses correlated, but more representative , samples to reduce estimator variance. Specifically, we show how to generate antithetic samples that match sample moments with the true moments of an underlying importance distribution. Combining a differentiable antithetic sampler with modern stochastic variational inference, we showcase the effectiveness of this approach for learning a deep generative model.
10/05/2018 ∙ by Mike Wu, et al. ∙ 6 ∙ shareread it
Stefano Ermon
is this you? claim profile
Assistant Professor, Department of Computer Science Fellow, Woods Institute for the Environment at Stanford University