
MetaAmortized Variational Inference and Learning
How can we learn to do probabilistic inference in a way that generalizes between models? Amortized variational inference learns for a single model, sharing statistical strength across observations. This benefits scalability and model learning, but does not help with generalization to new models. We propose metaamortized variational inference, a framework that amortizes the cost of inference over a family of generative models. We apply this approach to deep generative models by introducing the MetaVAE: a variational autoencoder that learns to generalize to new distributions and rapidly solve new unsupervised learning problems using only a small number of target examples. Empirically, we validate the approach by showing that the MetaVAE can: (1) capture relevant sufficient statistics for inference, (2) learn useful representations of data for downstream tasks such as clustering, and (3) perform metadensity estimation on unseen synthetic distributions and outofsample Omniglot alphabets.
02/05/2019 ∙ by Kristy Choi, et al. ∙ 26 ∙ shareread it

Variational Estimators for Bayesian Optimal Experimental Design
Bayesian optimal experimental design (BOED) is a principled framework for making efficient use of limited experimental resources. Unfortunately, its applicability is hampered by the difficulty of obtaining accurate estimates of the expected information gain (EIG) of an experiment. To address this, we introduce several classes of fast EIG estimators suited to the experiment design context by building on ideas from variational inference and mutual information estimation. We show theoretically and empirically that these estimators can provide significant gains in speed and accuracy over previous approaches. We demonstrate the practicality of our approach via a number of experiments, including an adaptive experiment with human participants.
03/13/2019 ∙ by Adam Foster, et al. ∙ 16 ∙ shareread it

Differentiable Antithetic Sampling for Variance Reduction in Stochastic Variational Inference
Stochastic optimization techniques are standard in variational inference algorithms. These methods estimate gradients by approximating expectations with independent Monte Carlo samples. In this paper, we explore a technique that uses correlated, but more representative , samples to reduce estimator variance. Specifically, we show how to generate antithetic samples that match sample moments with the true moments of an underlying importance distribution. Combining a differentiable antithetic sampler with modern stochastic variational inference, we showcase the effectiveness of this approach for learning a deep generative model.
10/05/2018 ∙ by Mike Wu, et al. ∙ 6 ∙ shareread it

Bias and Generalization in Deep Generative Models: An Empirical Study
In high dimensional settings, density estimation algorithms rely crucially on their inductive bias. Despite recent empirical success, the inductive bias of deep generative models is not well understood. In this paper we propose a framework to systematically investigate bias and generalization in deep generative models of images. Inspired by experimental methods from cognitive psychology, we probe each learning algorithm with carefully designed training datasets to characterize when and how existing models generate novel attributes and their combinations. We identify similarities to human psychology and verify that these patterns are consistent across commonly used models and architectures.
11/08/2018 ∙ by Shengjia Zhao, et al. ∙ 6 ∙ shareread it

Church: a language for generative models
We introduce Church, a universal language for describing stochastic generative processes. Church is based on the Lisp model of lambda calculus, containing a pure Lisp as its deterministic subset. The semantics of Church is defined in terms of evaluation histories and conditional distributions on such histories. Church also includes a novel language construct, the stochastic memoizer, which enables simple description of many complex nonparametric models. We illustrate language features through several examples, including: a generalized Bayes net in which parameters cluster over trials, infinite PCFGs, planning by inference, and various nonparametric clustering models. Finally, we show how to implement query on any Church program, exactly and approximately, using Monte Carlo techniques.
06/13/2012 ∙ by Noah Goodman, et al. ∙ 0 ∙ shareread it

The Infinite Latent Events Model
We present the Infinite Latent Events Model, a nonparametric hierarchical Bayesian distribution over infinite dimensional Dynamic Bayesian Networks with binary state representations and noisyORlike transitions. The distribution can be used to learn structure in discrete timeseries data by simultaneously inferring a set of latent events, which events fired at each timestep, and how those events are causally linked. We illustrate the model on a sound factorization task, a network topology identification task, and a video game task.
05/09/2012 ∙ by David Wingate, et al. ∙ 0 ∙ shareread it

Multimodal Generative Models for Scalable WeaklySupervised Learning
Multiple modalities often cooccur when describing natural phenomena. Learning a joint representation of these modalities should yield deeper and more useful representations. Previous work have proposed generative models to handle multimodal input. However, these models either do not learn a joint distribution or require complex additional computations to handle missing data. Here, we introduce a multimodal variational autoencoder that uses a productofexperts inference network and a subsampled training paradigm to solve the multimodal inference problem. Notably, our model shares parameters to efficiently learn under any combination of missing modalities, thereby enabling weaklysupervised learning. We apply our method on four datasets and show that we match stateoftheart performance using many fewer parameters. In each case our approach yields strong weaklysupervised results. We then consider a case study of learning image transformationsedge detection, colorization, facial landmark segmentation, etc.as a set of modalities. We find appealing results across this range of tasks.
02/14/2018 ∙ by Mike Wu, et al. ∙ 0 ∙ shareread it

Pragmatically Informative Image Captioning with CharacterLevel Reference
We combine a neural image captioner with a Rational Speech Acts (RSA) model to make a system that is pragmatically informative: its objective is to produce captions that are not merely true but also distinguish their inputs from similar images. Previous attempts to combine RSA with neural image captioning require an inference which normalizes over the entire set of possible utterances. This poses a serious problem of efficiency, previously solved by sampling a small subset of possible utterances. We instead solve this problem by implementing a version of RSA which operates at the level of characters ("a","b","c"...) during the unrolling of the caption. We find that the utterancelevel effect of referential captions can be obtained with only characterlevel decisions. Finally, we introduce an automatic method for testing the performance of pragmatic speaker models, and show that our model outperforms a nonpragmatic baseline as well as a wordlevel RSA captioner.
04/15/2018 ∙ by Reuben CohnGordon, et al. ∙ 0 ∙ shareread it

Zero Shot Learning for Code Education: Rubric Sampling with Deep Learning Inference
In modern computer science education, massive open online courses (MOOCs) log thousands of hours of data about how students solve coding challenges. Being so rich in data, these platforms have garnered the interest of the machine learning community, with many new algorithms attempting to autonomously provide feedback to help future students learn. But what about those first hundred thousand students? In most educational contexts (i.e. classrooms), assignments do not have enough historical data for supervised learning. In this paper, we introduce a humanintheloop "rubric sampling" approach to tackle the "zero shot" feedback challenge. We are able to provide autonomous feedback for the first students working on an introductory programming assignment with accuracy that substantially outperforms datahungry algorithms and approaches human level fidelity. Rubric sampling requires minimal teacher effort, can associate feedback with specific parts of a student's solution and can articulate a student's misconceptions in the language of the instructor. Deep learning inference enables rubric sampling to further improve as more assignment specific student data is acquired. We demonstrate our results on a novel dataset from Code.org, the world's largest programming education platform.
09/05/2018 ∙ by Mike Wu, et al. ∙ 0 ∙ shareread it

Tensor Variable Elimination for Plated Factor Graphs
A wide class of machine learning algorithms can be reduced to variable elimination on factor graphs. While factor graphs provide a unifying notation for these algorithms, they do not provide a compact way to express repeated structure when compared to plate diagrams for directed graphical models. To exploit efficient tensor algebra in graphs with plates of variables, we generalize undirected factor graphs to plated factor graphs and variable elimination to a tensor variable elimination algorithm that operates directly on plated factor graphs. Moreover, we generalize complexity bounds based on treewidth and characterize the class of plated factor graphs for which inference is tractable. As an application, we integrate tensor variable elimination into the Pyro probabilistic programming language to enable exact inference in discrete latent variable models with repeated structure. We validate our methods with experiments on both directed and undirected graphical models, including applications to polyphonic music modeling, animal movement modeling, and latent sentiment analysis.
02/08/2019 ∙ by Fritz Obermeyer, et al. ∙ 0 ∙ shareread it

Pragmatic inference and visual abstraction enable contextual flexibility during visual communication
Visual modes of communication are ubiquitous in modern life. Here we investigate drawing, the most basic form of visual communication. Communicative drawing poses a core challenge for theories of how vision and social cognition interact, requiring a detailed understanding of how sensory information and social context jointly determine what information is relevant to communicate. Participants (N=192) were paired in an online environment to play a sketchingbased reference game. On each trial, both participants were shown the same four objects, but in different locations. The sketcher's goal was to draw one of these objects  the target  so that the viewer could select it from the array. There were two types of trials: close, where objects belonged to the same basiclevel category, and far, where objects belonged to different categories. We found that people exploited information in common ground with their partner to efficiently communicate about the target: on far trials, sketchers achieved high recognition accuracy while applying fewer strokes, using less ink, and spending less time on their drawings than on close trials. We hypothesized that humans succeed in this task by recruiting two core competencies: (1) visual abstraction, the capacity to perceive the correspondence between an object and a drawing of it; and (2) pragmatic inference, the ability to infer what information would help a viewer distinguish the target from distractors. To evaluate this hypothesis, we developed a computational model of the sketcher that embodied both competencies, instantiated as a deep convolutional neural network nested within a probabilistic program. We found that this model fit human data well and outperformed lesioned variants, providing an algorithmically explicit theory of how perception and social cognition jointly support contextual flexibility in visual communication.
03/11/2019 ∙ by Judith Fan, et al. ∙ 0 ∙ shareread it
Noah Goodman
is this you? claim profile