Variational Actor-Critic Algorithms

08/03/2021
by   Yuhua Zhu, et al.
0

We introduce a class of variational actor-critic algorithms based on a variational formulation over both the value function and the policy. The objective function of the variational formulation consists of two parts: one for maximizing the value function and the other for minimizing the Bellman residual. Besides the vanilla gradient descent with both the value function and the policy updates, we propose two variants, the clipping method and the flipping method, in order to speed up the convergence. We also prove that, when the prefactor of the Bellman residual is sufficiently large, the fixed point of the algorithm is close to the optimal policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2021

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most importan...
research
10/08/2019

Deep Value Model Predictive Control

In this paper, we introduce an actor-critic algorithm called Deep Value ...
research
05/02/2010

Adaptive Bases for Reinforcement Learning

We consider the problem of reinforcement learning using function approxi...
research
11/19/2021

Learn Quasi-stationary Distributions of Finite State Markov Chain

We propose a reinforcement learning (RL) approach to compute the express...
research
06/24/2021

Mix and Mask Actor-Critic Methods

Shared feature spaces for actor-critic methods aims to capture generaliz...
research
03/08/2022

Graph Reinforcement Learning for Predictive Power Allocation to Mobile Users

Allocating resources with future channels can save resource to ensure qu...
research
05/25/2019

A Kernel Loss for Solving the Bellman Equation

Value function learning plays a central role in many state-of-the-art re...

Please sign up or login with your details

Forgot password? Click here to reset