Cautious Actor-Critic

07/12/2021
by   Lingwei Zhu, et al.
0

The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a novel off-policy AC algorithm cautious actor-critic (CAC). The name cautious comes from the doubly conservative nature that we exploit the classic policy interpolation from conservative policy iteration for the actor and the entropy-regularization of conservative value iteration for the critic. Our key observation is the entropy-regularized critic facilitates and simplifies the unwieldy interpolated actor update while still ensuring robust policy improvement. We compare CAC to state-of-the-art AC methods on a set of challenging continuous control problems and demonstrate that CAC achieves comparable performance while significantly stabilizes learning.

READ FULL TEXT
research
12/19/2018

TD-Regularized Actor-Critic Methods

Actor-critic methods can achieve incredible performance on difficult rei...
research
07/02/2019

Modified Actor-Critics

Robot Learning, from a control point of view, often involves continuous ...
research
05/08/2022

Simultaneous Double Q-learning with Conservative Advantage Learning for Actor-Critic Methods

Actor-critic Reinforcement Learning (RL) algorithms have achieved impres...
research
04/29/2019

DAC: The Double Actor-Critic Architecture for Learning Options

We reformulate the option framework as two parallel augmented MDPs. Unde...
research
06/16/2019

ASAC: Active Sensing using Actor-Critic models

Deciding what and when to observe is critical when making observations i...
research
04/19/2023

CASOG: Conservative Actor-critic with SmOoth Gradient for Skill Learning in Robot-Assisted Intervention

Robot-assisted intervention has shown reduced radiation exposure to phys...
research
02/14/2023

Conservative State Value Estimation for Offline Reinforcement Learning

Offline reinforcement learning faces a significant challenge of value ov...

Please sign up or login with your details

Forgot password? Click here to reset