Actor-Critic or Critic-Actor? A Tale of Two Time Scales

10/10/2022
by   Shalabh Bhatnagar, et al.
0

We revisit the standard formulation of tabular actor-critic algorithm as a two time-scale stochastic approximation with value function computed on a faster time-scale and policy computed on a slower time-scale. This emulates policy iteration. We begin by observing that reversal of the time scales will in fact emulate value iteration and is a legitimate algorithm. We compare the two empirically with and without function approximation (with both linear and nonlinear function approximators) and observe that our proposed critic-actor algorithm performs better empirically though with a marginal increase in the computational cost.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2019

Provably Convergent Off-Policy Actor-Critic with Function Approximation

We present the first provably convergent off-policy actor-critic algorit...
research
09/06/2021

Error Controlled Actor-Critic

On error of value function inevitably causes an overestimation phenomeno...
research
01/23/2017

Learning to Decode for Future Success

We introduce a simple, general strategy to manipulate the behavior of a ...
research
06/28/2023

SARC: Soft Actor Retrospective Critic

The two-time scale nature of SAC, which is an actor-critic algorithm, is...
research
06/19/2020

Band-limited Soft Actor Critic Model

Soft Actor Critic (SAC) algorithms show remarkable performance in comple...
research
07/10/2020

A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

This paper analyzes a two-timescale stochastic algorithm for a class of ...
research
06/16/2019

ASAC: Active Sensing using Actor-Critic models

Deciding what and when to observe is critical when making observations i...

Please sign up or login with your details

Forgot password? Click here to reset