Backprop-Q: Generalized Backpropagation for Stochastic Computation Graphs

07/25/2018
by   Xiaoran Xu, et al.
0

In real-world scenarios, it is appealing to learn a model carrying out stochastic operations internally, known as stochastic computation graphs (SCGs), rather than learning a deterministic mapping. However, standard backpropagation is not applicable to SCGs. We attempt to address this issue from the angle of cost propagation, with local surrogate costs, called Q-functions, constructed and learned for each stochastic node in an SCG. Then, the SCG can be trained based on these surrogate costs using standard backpropagation. We propose the entire framework as a solution to generalize backpropagation for SCGs, which resembles an actor-critic architecture but based on a graph. For broad applicability, we study a variety of SCG structures from one cost to multiple costs. We utilize recent advances in reinforcement learning (RL) and variational Bayes (VB), such as off-policy critic learning and unbiased-and-low-variance gradient estimation, and review them in the context of SCGs. The generalized backpropagation extends transported learning signals beyond gradients between stochastic nodes while preserving the benefit of backpropagating gradients through deterministic nodes. Experimental suggestions and concerns are listed to help design and test any specific model using this framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/06/2017

Revisiting stochastic off-policy action-value gradients

Off-policy stochastic actor-critic methods rely on approximating the sto...
research
05/24/2023

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL)...
research
04/09/2021

Learning Sampling Policy for Faster Derivative Free Optimization

Zeroth-order (ZO, also known as derivative-free) methods, which estimate...
research
02/15/2019

Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock

In this paper we introduce a reinforcement learning (RL) approach for tr...
research
11/16/2015

MuProp: Unbiased Backpropagation for Stochastic Neural Networks

Deep neural networks are powerful parametric models that can be trained ...
research
02/15/2019

Reinforcement Learning Without Backpropagation or a Clock

In this paper we introduce a reinforcement learning (RL) approach for tr...
research
05/16/2023

Coagent Networks: Generalized and Scaled

Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011...

Please sign up or login with your details

Forgot password? Click here to reset