A Convergent Online Single Time Scale Actor Critic Algorithm

09/16/2009
by   D. Di Castro, et al.
0

Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological relevance. In this paper, we introduce an online temporal difference based actor-critic algorithm which is proved to converge to a neighborhood of a local maximum of the average reward. Linear function approximation is used by the critic in order estimate the value function, and the temporal difference signal, which is passed from the critic to the actor. The main distinguishing feature of the present convergence proof is that both the actor and the critic operate on a similar time scale, while in most current convergence proofs they are required to have very different time scales in order to converge. Moreover, the same temporal difference signal is used to update the parameters of both the actor and the critic. A limitation of the proposed approach, compared to results available for two time scale convergence, is that convergence is guaranteed only to a neighborhood of an optimal value, rather to an optimal value itself. The single time scale and identical temporal difference signal used by the actor and the critic, may provide a step towards constructing more biologically realistic models of reinforcement learning in the brain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2021

Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation

Actor-critic methods integrating target networks have exhibited a stupen...
research
06/25/2021

A nonlinear hidden layer enables actor-critic agents to learn multiple paired association navigation

Navigation to multiple cued reward locations has been increasingly used ...
research
06/08/2022

Scalable Online Disease Diagnosis via Multi-Model-Fused Actor-Critic Reinforcement Learning

For those seeking healthcare advice online, AI based dialogue agents cap...
research
12/02/2020

Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER

We prove under commonly used assumptions the convergence of actor-critic...
research
05/02/2010

Adaptive Bases for Reinforcement Learning

We consider the problem of reinforcement learning using function approxi...
research
03/24/2022

Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory

Driving 3D characters to dance following a piece of music is highly chal...
research
06/03/2011

Efficient Reinforcement Learning Using Recursive Least-Squares Methods

The recursive least-squares (RLS) algorithm is one of the most well-know...

Please sign up or login with your details

Forgot password? Click here to reset