Efficient Learning of Discrete-Continuous Computation Graphs

07/26/2023
by   David Friede, et al.
0

Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph's execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.

READ FULL TEXT
research
03/04/2020

Generalized Gumbel-Softmax Gradient Estimator for Various Discrete Random Variables

Estimating the gradients of stochastic nodes is one of the crucial resea...
research
11/21/2019

Discrete and Continuous Deep Residual Learning Over Graphs

In this paper we propose the use of continuous residual modules for grap...
research
11/02/2016

The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

The reparameterization trick enables optimizing large scale stochastic c...
research
12/19/2019

Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax

The Gumbel-Softmax is a continuous distribution over the simplex that is...
research
01/18/2023

Discrete Latent Structure in Neural Networks

Many types of data from fields including natural language processing, co...
research
04/01/2021

Reconciling the Discrete-Continuous Divide: Towards a Mathematical Theory of Sparse Communication

Neural networks and other machine learning models compute continuous rep...
research
10/27/2021

Enhancing Reinforcement Learning with discrete interfaces to learn the Dyck Language

Even though most interfaces in the real world are discrete, no efficient...

Please sign up or login with your details

Forgot password? Click here to reset