Parameter-based Value Functions

06/16/2020
by   Francesco Faccio, et al.
7

Learning value functions off-policy is at the core of modern Reinforcement Learning (RL). Traditional off-policy actor-critic algorithms, however, only approximate the true policy gradient, since the gradient ∇_θ Q^π_θ(s,a) of the action-value function with respect to the policy parameters is often ignored. We introduce a class of value functions called Parameter-based Value Functions (PVFs) whose inputs include the policy parameters. PVFs can evaluate the performance of any policy given a state, a state-action pair, or a distribution over the RL agent's initial states. We show how PVFs yield exact policy gradient theorems. We derive off-policy actor-critic algorithms based on PVFs trained using Monte Carlo or Temporal Difference methods. Preliminary experimental results indicate that PVFs can effectively evaluate deterministic linear and nonlinear policies, outperforming state-of-the-art algorithms in the continuous control environment Swimmer-v3. Finally, we show how recurrent neural networks can be trained through PVFs to solve supervised and RL problems involving partial observability and long time lags between relevant events. This provides an alternative to backpropagation through time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2018

Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

We present the first class of policy-gradient algorithms that work with ...
research
07/04/2022

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

Learning to evaluate and improve policies is a core problem of Reinforce...
research
10/03/2018

Comparison of Reinforcement Learning algorithms applied to the Cart Pole problem

Designing optimal controllers continues to be challenging as systems are...
research
06/05/2022

Learning Dynamics and Generalization in Reinforcement Learning

Solving a reinforcement learning (RL) problem poses two competing challe...
research
11/19/2021

Learn Quasi-stationary Distributions of Finite State Markov Chain

We propose a reinforcement learning (RL) approach to compute the express...
research
03/13/2023

Reinforcement Learning-based Wavefront Sensorless Adaptive Optics Approaches for Satellite-to-Ground Laser Communication

Optical satellite-to-ground communication (OSGC) has the potential to im...
research
07/16/2020

Meta-Gradient Reinforcement Learning with an Objective Discovered Online

Deep reinforcement learning includes a broad family of algorithms that p...

Please sign up or login with your details

Forgot password? Click here to reset