Estimating or Propagating Gradients Through Stochastic Neurons

05/14/2013
by   Yoshua Bengio, et al.
0

Stochastic neurons can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic neurons, i.e., can we "back-propagate" through these stochastic neurons? We examine this question, existing approaches, and present two novel families of solutions, applicable in different settings. In particular, it is demonstrated that a simple biologically plausible formula gives rise to an an unbiased (but noisy) estimator of the gradient with respect to a binary stochastic neuron firing probability. Unlike other estimators which view the noise as a small perturbation in order to estimate gradients by finite differences, this estimator is unbiased even without assuming that the stochastic perturbation is small. This estimator is also interesting because it can be applied in very general settings which do not allow gradient back-propagation, including the estimation of the gradient with respect to future rewards, as required in reinforcement learning setups. We also propose an approach to approximating this unbiased but high-variance estimator by learning to predict it using a biased estimator. The second approach we propose assumes that an estimator of the gradient can be back-propagated and it provides an unbiased estimator of the gradient, but can only work with non-linearities unlike the hard threshold, but like the rectifier, that are not flat for all of their range. This is similar to traditional sigmoidal units but has the advantage that for many inputs, a hard decision (e.g., a 0 output) can be produced, which would be convenient for conditional computation and achieving sparse representations and sparse gradients.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2022

Constructing unbiased gradient estimators with finite variance for conditional stochastic optimization

We study stochastic gradient descent for solving conditional stochastic ...
research
03/09/2021

A Gradient Estimator for Time-Varying Electrical Networks with Non-Linear Dissipation

We propose a method for extending the technique of equilibrium propagati...
research
05/13/2019

A Stochastic Gradient Method with Biased Estimation for Faster Nonconvex Optimization

A number of optimization approaches have been proposed for optimizing no...
research
12/20/2022

Generalized Simultaneous Perturbation Stochastic Approximation with Reduced Estimator Bias

We present in this paper a family of generalized simultaneous perturbati...
research
11/27/2015

Gradient Estimation with Simultaneous Perturbation and Compressive Sensing

This paper aims at achieving a "good" estimator for the gradient of a fu...
research
07/05/2020

Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels

In weakly supervised learning, unbiased risk estimator(URE) is a powerfu...
research
02/02/2022

Do Differentiable Simulators Give Better Policy Gradients?

Differentiable simulators promise faster computation time for reinforcem...

Please sign up or login with your details

Forgot password? Click here to reset