Autoencoding Conditional Neural Processes for Representation Learning

by   Victor Prokhorov, et al.

Conditional neural processes (CNPs) are a flexible and efficient family of models that learn to learn a stochastic process from observations. In the visual domain, they have seen particular application in contextual image completion - observing pixel values at some locations to predict a distribution over values at other unobserved locations. However, the choice of pixels in learning such a CNP is typically either random or derived from a simple statistical measure (e.g. pixel variance). Here, we turn the problem on its head and ask: which pixels would a CNP like to observe? That is, which pixels allow fitting CNP, and do such pixels tell us something about the underlying image? Viewing the context provided to the CNP as fixed-size latent representations, we construct an amortised variational framework, Partial Pixel Space Variational Autoencoder (PPS-VAE), for predicting this context simultaneously with learning a CNP. We evaluate PPS-VAE on a set of vision datasets, and find that not only is it possible to learn context points while also fitting CNPs, but that their spatial arrangement and values provides strong signal for the information contained in the image - evaluated through the lens of classification. We believe the PPS-VAE provides a promising avenue to explore learning interpretable and effective visual representations.


page 7

page 14

page 15

page 16

page 18

page 19

page 20

page 21


Pixel VQ-VAEs for Improved Pixel Art Representation

Machine learning has had a great deal of success in image processing. Ho...

Learning Diverse Image Colorization

Colorization is an ambiguous problem, with multiple viable colorizations...

Variational Lossy Autoencoder

Representation learning seeks to expose certain aspects of observed data...

Distributional Variational AutoEncoder To Infinite Quantiles and Beyond Gaussianity

The Gaussianity assumption has been pointed out as the main limitation o...

Unsupervised speech representation learning using WaveNet autoencoders

We consider the task of unsupervised extraction of meaningful latent rep...

Supervised Vector Quantized Variational Autoencoder for Learning Interpretable Global Representations

Learning interpretable representations of data remains a central challen...

VAE-Info-cGAN: Generating Synthetic Images by Combining Pixel-level and Feature-level Geospatial Conditional Inputs

Training robust supervised deep learning models for many geospatial appl...

Please sign up or login with your details

Forgot password? Click here to reset