
Emerging Convolutions for Generative Normalizing Flows
Generative flows are attractive because they admit exact likelihood optimization and efficient image synthesis. Recently, Kingma & Dhariwal (2018) demonstrated with Glow that generative flows are capable of generating high quality images. We generalize the 1 x 1 convolutions proposed in Glow to invertible d x d convolutions, which are more flexible since they operate on both channel and spatial axes. We propose two methods to produce invertible convolutions that have receptive fields identical to standard convolutions: Emerging convolutions are obtained by chaining specific autoregressive convolutions, and periodic convolutions are decoupled in the frequency domain. Our experiments show that the flexibility of d x d convolutions significantly improves the performance of generative flow models on galaxy images, CIFAR10 and ImageNet.
01/30/2019 ∙ by Emiel Hoogeboom, et al. ∙ 22 ∙ shareread it

Differentiable probabilistic models of scientific imaging with the Fourier slice theorem
Scientific imaging techniques such as optical and electron microscopy and computed tomography (CT) scanning are used to study the 3D structure of an object through 2D observations. These observations are related to the original 3D object through orthogonal integral projections. For common 3D reconstruction algorithms, computational efficiency requires the modeling of the 3D structures to take place in Fourier space by applying the Fourier slice theorem. At present, it is unclear how to differentiate through the projection operator, and hence current learning algorithms can not rely on gradient based methods to optimize 3D structure models. In this paper we show how backpropagation through the projection operator in Fourier space can be achieved. We demonstrate the validity of the approach with experiments on 3D reconstruction of proteins. We further extend our approach to learning probabilistic models of 3D objects. This allows us to predict regions of low sampling rates or estimate noise. A higher sample efficiency can be reached by utilizing the learned uncertainties of the 3D structure as an unsupervised estimate of the model fit. Finally, we demonstrate how the reconstruction algorithm can be extended with an amortized inference scheme on unknown attributes such as object pose. Through empirical studies we show that joint inference of the 3D structure and the object pose becomes more difficult when the ground truth object contains more symmetries. Due to the presence of for instance (approximate) rotational symmetries, the pose estimation can easily get stuck in local optima, inhibiting a finegrained highquality estimate of the 3D structure.
06/18/2019 ∙ by Karen Ullrich, et al. ∙ 8 ∙ shareread it

Integer Discrete Flows and Lossless Compression
Lossless compression methods shorten the expected representation size of data without loss of information, using a statistical model. Flowbased models are attractive in this setting because they admit exact likelihood optimization, which is equivalent to minimizing the expected number of bits per message. However, conventional flows assume continuous data, which may lead to reconstruction errors when quantized for compression. For that reason, we introduce a generative flow for ordinal discrete data called Integer Discrete Flow (IDF): a bijective integer map that can learn rich transformations on highdimensional data. As building blocks for IDFs, we introduce flexible transformation layers called integer discrete coupling and lower triangular coupling. Our experiments show that IDFs are competitive with other flowbased generative models. Furthermore, we demonstrate that IDF based compression achieves stateoftheart lossless compression rates on CIFAR10, ImageNet32, and ImageNet64.
05/17/2019 ∙ by Emiel Hoogeboom, et al. ∙ 7 ∙ shareread it

Sinkhorn AutoEncoders
Optimal Transport offers an alternative to maximum likelihood for learning generative autoencoding models. We show how this principle dictates the minimization of the Wasserstein distance between the encoder aggregated posterior and the prior, plus a reconstruction error. We prove that in the nonparametric limit the autoencoder generates the data distribution if and only if the two distributions match exactly, and that the optimum can be obtained by deterministic autoencoders. We then introduce the Sinkhorn AutoEncoder (SAE), which casts the problem into Optimal Transport on the latent space. The resulting Wasserstein distance is minimized by backpropagating through the Sinkhorn algorithm. SAE models the aggregated posterior as an implicit distribution and therefore does not need a reparameterization trick for gradients estimation. Moreover, it requires virtually no adaptation to different prior distributions. We demonstrate its flexibility by considering models with hyperspherical and Dirichlet priors, as well as a simple case of probabilistic programming. SAE matches or outperforms other autoencoding models in visual quality and FID scores.
10/02/2018 ∙ by Giorgio Patrini, et al. ∙ 6 ∙ shareread it

Modeling Relational Data with Graph Convolutional Networks
Knowledge graphs enable a wide variety of applications, including question answering and information retrieval. Despite the great effort invested in their creation and maintenance, even the largest (e.g., Yago, DBPedia or Wikidata) remain incomplete. We introduce Relational Graph Convolutional Networks (RGCNs) and apply them to two standard knowledge base completion tasks: Link prediction (recovery of missing facts, i.e. subjectpredicateobject triples) and entity classification (recovery of missing entity attributes). RGCNs are related to a recent class of neural networks operating on graphs, and are developed specifically to deal with the highly multirelational data characteristic of realistic knowledge bases. We demonstrate the effectiveness of RGCNs as a standalone model for entity classification. We further show that factorization models for link prediction such as DistMult can be significantly improved by enriching them with an encoder model to accumulate evidence over multiple inference steps in the relational graph, demonstrating a large improvement of 29.8
03/17/2017 ∙ by Michael Schlichtkrull, et al. ∙ 0 ∙ shareread it

Graph Convolutional Matrix Completion
We consider matrix completion for recommender systems from the point of view of link prediction on graphs. Interaction data such as movie ratings can be represented by a bipartite useritem graph with labeled edges denoting observed ratings. Building on recent progress in deep learning on graphstructured data, we propose a graph autoencoder framework based on differentiable message passing on the bipartite interaction graph. Our model shows competitive performance on standard collaborative filtering benchmarks. In settings where complimentary feature information or structured data such as a social network is available, our framework outperforms recent stateoftheart methods.
06/07/2017 ∙ by Rianne van den Berg, et al. ∙ 0 ∙ shareread it

Sylvester Normalizing Flows for Variational Inference
Variational inference relies on flexible approximate posterior distributions. Normalizing flows provide a general recipe to construct flexible variational posteriors. We introduce Sylvester normalizing flows, which can be seen as a generalization of planar flows. Sylvester normalizing flows remove the wellknown singleunit bottleneck from planar flows, making a single transformation much more flexible. We compare the performance of Sylvester normalizing flows against planar flows and inverse autoregressive flows and demonstrate that they compare favorably on several datasets.
03/15/2018 ∙ by Rianne van den Berg, et al. ∙ 0 ∙ shareread it

Predictive Uncertainty through Quantization
Highrisk domains require reliable confidence estimates from predictive models. Deep latent variable models provide these, but suffer from the rigid variational distributions used for tractable inference, which err on the side of overconfidence. We propose Stochastic Quantized Activation Distributions (SQUAD), which imposes a flexible yet tractable distribution over discretized latent variables. The proposed method is scalable, selfnormalizing and sample efficient. We demonstrate that the model fully utilizes the flexible distribution, learns interesting nonlinearities, and provides predictive uncertainty of competitive quality.
10/12/2018 ∙ by Bastiaan S. Veeling, et al. ∙ 0 ∙ shareread it
Rianne van den Berg
is this you? claim profile