Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning

02/14/2019
by   Gang Chen, et al.
0

We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary entropy measures that generalize Shannon entropy, such as Tsallis entropy and Renyi entropy, can be utilized to properly randomize action selection while fulfilling the goal of maximizing expected long-term rewards. Our theory gives birth to two new algorithms, i.e., Tsallis entropy Actor-Critic (TAC) and Renyi entropy Actor-Critic (RAC). Theoretical analysis shows that these algorithms can be more effective than SAC. Moreover, they pave the way for us to develop a new Ensemble Actor-Critic (EAC) algorithm in this paper that features the use of a bootstrap mechanism for deep environment exploration as well as a new value-function based mechanism for high-level action selection. Empirically we show that TAC, RAC and EAC can achieve state-of-the-art performance on a range of benchmark control tasks, outperforming SAC and several cutting-edge learning algorithms in terms of both sample efficiency and effectiveness.

READ FULL TEXT
research
01/04/2018

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Model-free deep reinforcement learning (RL) algorithms have been demonst...
research
04/09/2021

Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy Behavior Representation for Deep Reinforcement Learning

In this work, we propose Behavior-Guided Actor-Critic (BAC), an off-poli...
research
10/01/2022

Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States

Actor-critic (AC) algorithms are a class of model-free deep reinforcemen...
research
05/09/2021

CASA-B: A Unified Framework of Model-Free Reinforcement Learning

Building on the breakthrough of reinforcement learning, this paper intro...
research
11/18/2020

Weighted Entropy Modification for Soft Actor-Critic

We generalize the existing principle of the maximum Shannon entropy in r...
research
04/29/2018

From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction

In this work, we study the credit assignment problem in reward augmented...
research
02/07/2022

Soft Actor-Critic with Inhibitory Networks for Faster Retraining

Reusing previously trained models is critical in deep reinforcement lear...

Please sign up or login with your details

Forgot password? Click here to reset