
Emerging Convolutions for Generative Normalizing Flows
Generative flows are attractive because they admit exact likelihood optimization and efficient image synthesis. Recently, Kingma & Dhariwal (2018) demonstrated with Glow that generative flows are capable of generating high quality images. We generalize the 1 x 1 convolutions proposed in Glow to invertible d x d convolutions, which are more flexible since they operate on both channel and spatial axes. We propose two methods to produce invertible convolutions that have receptive fields identical to standard convolutions: Emerging convolutions are obtained by chaining specific autoregressive convolutions, and periodic convolutions are decoupled in the frequency domain. Our experiments show that the flexibility of d x d convolutions significantly improves the performance of generative flow models on galaxy images, CIFAR10 and ImageNet.
01/30/2019 ∙ by Emiel Hoogeboom, et al. ∙ 22 ∙ shareread it

Adversarial Variational Inference and Learning in Markov Random Fields
Markov random fields (MRFs) find applications in a variety of machine learning areas, while the inference and learning of such models are challenging in general. In this paper, we propose the Adversarial Variational Inference and Learning (AVIL) algorithm to solve the problems with a minimal assumption about the model structure of an MRF. AVIL employs two variational distributions to approximately infer the latent variables and estimate the partition function, respectively. The variational distributions, which are parameterized as neural networks, provide an estimate of the negative log likelihood of the MRF. On one hand, the estimate is in an intuitive form of approximate contrastive free energy. On the other hand, the estimate is a minimax optimization problem, which is solved by stochastic gradient descent in an alternating manner. We apply AVIL to various undirected generative models in a fully blackbox manner and obtain better results than existing competitors on several real datasets.
01/24/2019 ∙ by Chongxuan Li, et al. ∙ 18 ∙ shareread it

The Deep Weight Prior. Modeling a prior distribution for CNNs using generative models
Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior, that in contrast to previously published techniques, favors empirically estimated structure of convolutional filters e.g., spatial correlations of weights. We define deep weight prior as an implicit distribution and propose a method for variational inference with such type of implicit priors. In experiments, we show that deep weight priors can improve the performance of Bayesian neural networks on several problems when training data is limited. Also, we found that initialization of weights of conventional networks with samples from deep weight prior leads to faster training.
10/16/2018 ∙ by Andrei Atanov, et al. ∙ 12 ∙ shareread it

DIVA: Domain Invariant Variational Autoencoders
We consider the problem of domain generalization, namely, how to learn representations given data from a set of domains that generalize to data from a previously unseen domain. We propose the Domain Invariant Variational Autoencoder (DIVA), a generative model that tackles this problem by learning three independent latent subspaces, one for the domain, one for the class, and one for any residual variations. We highlight that due to the generative nature of our model we can also incorporate unlabeled data from known or previously unseen domains. To the best of our knowledge this has not been done before in a domain generalization setting. This property is highly desirable in fields like medical imaging where labeled data is scarce. We experimentally evaluate our model on the rotated MNIST benchmark and a malaria cell images dataset where we show that (i) the learned subspaces are indeed complementary to each other, (ii) we improve upon recent works on this task and (iii) incorporating unlabelled data can boost the performance even further.
05/24/2019 ∙ by Maximilian Ilse, et al. ∙ 11 ∙ shareread it

Probabilistic Binary Neural Networks
Low bitwidth weights and activations are an effective way of combating the increasing need for both memory and compute power of Deep Neural Networks. In this work, we present a probabilistic training method for Neural Network with both binary weights and activations, called BLRNet. By embracing stochasticity during training, we circumvent the need to approximate the gradient of nondifferentiable functions such as sign(), while still obtaining a fully Binary Neural Network at test time. Moreover, it allows for anytime ensemble predictions for improved performance and uncertainty estimates by sampling from the weight distribution. Since all operations in a layer of the BLRNet operate on random variables, we introduce stochastic versions of Batch Normalization and max pooling, which transfer well to a deterministic network at test time. We evaluate the BLRNet on multiple standardized benchmarks.
09/10/2018 ∙ by Jorn W. T. Peters, et al. ∙ 10 ∙ shareread it

Graph Refinement based Tree Extraction using MeanField Networks and Graph Neural Networks
Graph refinement, or the task of obtaining subgraphs of interest from overcomplete graphs, can have many varied applications. In this work, we extract tree structures from image data by, first deriving a graphbased representation of the volumetric data and then, posing tree extraction as a graph refinement task. We present two methods to perform graph refinement. First, we use meanfield approximation (MFA) to approximate the posterior density over the subgraphs from which the optimal subgraph of interest can be estimated. Mean field networks (MFNs) are used for inference based on the interpretation that iterations of MFA can be seen as feedforward operations in a neural network. This allows us to learn the model parameters using gradient descent. Second, we present a supervised learning approach using graph neural networks (GNNs) which can be seen as generalisations of MFNs. Subgraphs are obtained by jointly training a GNN based encoderdecoder pair, wherein the encoder learns useful edge embeddings from which the edge probabilities are predicted using a simple decoder. We discuss connections between the two classes of methods and compare them for the task of extracting airways from 3D, lowdose, chest CT data. We show that both the MFN and GNN models show significant improvement when compared to a baseline method, that is similar to a top performing method in the EXACT'09 Challenge, in detecting more branches.
11/21/2018 ∙ by Raghavendra Selvan, et al. ∙ 10 ∙ shareread it

Stochastic Beams and Where to Find Them: The GumbelTopk Trick for Sampling Sequences Without Replacement
The wellknown GumbelMax trick for sampling from a categorical distribution can be extended to sample k elements without replacement. We show how to implicitly apply this 'GumbelTopk' trick on a factorized distribution over sequences, allowing to draw exact samples without replacement using a Stochastic Beam Search. Even for exponentially large domains, the number of model evaluations grows only linear in k and the maximum sampled sequence length. The algorithm creates a theoretical connection between sampling and (deterministic) beam search and can be used as a principled intermediate alternative. In a translation task, the proposed method compares favourably against alternatives to obtain diverse yet good quality translations. We show that sequences sampled without replacement can be used to construct lowvariance estimators for expected sentencelevel BLEU score and model entropy.
03/14/2019 ∙ by Wouter Kool, et al. ∙ 10 ∙ shareread it

Gauge Equivariant Convolutional Networks and the Icosahedral CNN
The idea of equivariance to symmetry transformations provides one of the first theoretically grounded principles for neural network architecture design. Equivariant networks have shown excellent performance and data efficiency on vision and medical imaging problems that exhibit symmetries. Here we show how this principle can be extended beyond global symmetries to local gauge transformations, thereby enabling the development of equivariant convolutional networks on general manifolds. We implement gauge equivariant CNNs for signals defined on the icosahedron, which provides a reasonable approximation of spherical signals. By choosing to work with this very regular manifold, we are able to implement the gauge equivariant convolution using a single conv2d call, making it a highly scalable and practical alternative to Spherical CNNs. We evaluate the Icosahedral CNN on omnidirectional image segmentation and climate pattern segmentation, and find that it outperforms previous methods.
02/11/2019 ∙ by Taco S Cohen, et al. ∙ 10 ∙ shareread it

Combining Generative and Discriminative Models for Hybrid Inference
A graphical model is a structured representation of the data generating process. The traditional method to reason over random variables is to perform inference in this graphical model. However, in many cases the generating process is only a poor approximation of the much more complex true data generating process, leading to suboptimal estimation. The subtleties of the generative process are however captured in the data itself and we can `learn to infer', that is, learn a direct mapping from observations to explanatory latent variables. In this work we propose a hybrid model that combines graphical inference with a learned inverse model, which we structure as in a graph neural network, while the iterative algorithm as a whole is formulated as a recurrent neural network. By using crossvalidation we can automatically balance the amount of work performed by graphical inference versus learned inference. We apply our ideas to the Kalman filter, a Gaussian hidden Markov model for time sequences, and show, among other things, that our model can estimate the trajectory of a noisy chaotic Lorenz Attractor much more accurately than either the learned or graphical inference run in isolation.
06/06/2019 ∙ by Victor Garcia Satorras, et al. ∙ 8 ∙ shareread it

Integer Discrete Flows and Lossless Compression
Lossless compression methods shorten the expected representation size of data without loss of information, using a statistical model. Flowbased models are attractive in this setting because they admit exact likelihood optimization, which is equivalent to minimizing the expected number of bits per message. However, conventional flows assume continuous data, which may lead to reconstruction errors when quantized for compression. For that reason, we introduce a generative flow for ordinal discrete data called Integer Discrete Flow (IDF): a bijective integer map that can learn rich transformations on highdimensional data. As building blocks for IDFs, we introduce flexible transformation layers called integer discrete coupling and lower triangular coupling. Our experiments show that IDFs are competitive with other flowbased generative models. Furthermore, we demonstrate that IDF based compression achieves stateoftheart lossless compression rates on CIFAR10, ImageNet32, and ImageNet64.
05/17/2019 ∙ by Emiel Hoogeboom, et al. ∙ 7 ∙ shareread it

Sinkhorn AutoEncoders
Optimal Transport offers an alternative to maximum likelihood for learning generative autoencoding models. We show how this principle dictates the minimization of the Wasserstein distance between the encoder aggregated posterior and the prior, plus a reconstruction error. We prove that in the nonparametric limit the autoencoder generates the data distribution if and only if the two distributions match exactly, and that the optimum can be obtained by deterministic autoencoders. We then introduce the Sinkhorn AutoEncoder (SAE), which casts the problem into Optimal Transport on the latent space. The resulting Wasserstein distance is minimized by backpropagating through the Sinkhorn algorithm. SAE models the aggregated posterior as an implicit distribution and therefore does not need a reparameterization trick for gradients estimation. Moreover, it requires virtually no adaptation to different prior distributions. We demonstrate its flexibility by considering models with hyperspherical and Dirichlet priors, as well as a simple case of probabilistic programming. SAE matches or outperforms other autoencoding models in visual quality and FID scores.
10/02/2018 ∙ by Giorgio Patrini, et al. ∙ 6 ∙ shareread it
Max Welling
is this you? claim profile
Vice President Technologies at Qualcomm Technologies Netherlands, Senior Fellow Canadian Institute for Advanced Research, Cofounder and Chief Scientific Advisor Scyfer B.V.,