
Mixture of Discrete Normalizing Flows for Variational Inference
Advances in gradientbased inference have made distributional approximat...
read it

ARM: AugmentREINFORCEMerge Gradient for Discrete Latent Variable Models
To backpropagate the gradients through discrete stochastic layers, we en...
read it

IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression
In this paper we analyse and improve integer discrete flows for lossless...
read it

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables
The reparameterization trick enables optimizing large scale stochastic c...
read it

Efficient GradientBased Inference through Transformations between Bayes Nets and Neural Nets
Hierarchical Bayesian networks and neural networks with stochastic hidde...
read it

Interpretable Neural Predictions with Differentiable Binary Variables
The success of neural networks comes hand in hand with a desire for more...
read it

Learning sparse transformations through backpropagation
Many transformations in deep learning architectures are sparsely connect...
read it
Latent Transformations for DiscreteData Normalising Flows
Normalising flows (NFs) for discrete data are challenging because parameterising bijective transformations of discrete variables requires predicting discrete/integer parameters. Having a neural network architecture predict discrete parameters takes a nondifferentiable activation function (eg, the step function) which precludes gradientbased learning. To circumvent this nondifferentiability, previous work has employed biased proxy gradients, such as the straightthrough estimator. We present an unbiased alternative where rather than deterministically parameterising one transformation, we predict a distribution over latent transformations. With stochastic transformations, the marginal likelihood of the data is differentiable and gradientbased learning is possible via score function estimation. To test the viability of discretedata NFs we investigate performance on binary MNIST. We observe great challenges with both deterministic proxy gradients and unbiased score function estimation. Whereas the former often fails to learn even a shallow transformation, the variance of the latter could not be sufficiently controlled to admit deeper NFs.
READ FULL TEXT
Comments
There are no comments yet.