Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

01/04/2018
by   Tuomas Haarnoja, et al.
0

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy - that is, succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2018

Soft Actor-Critic Algorithms and Applications

Model-free deep reinforcement learning (RL) algorithms have been success...
research
02/14/2019

Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning

We propose a new policy iteration theory as an important extension of so...
research
09/07/2019

Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning

Maximum entropy deep reinforcement learning (RL) methods have been demon...
research
12/11/2020

OPAC: Opportunistic Actor-Critic

Actor-critic methods, a type of model-free reinforcement learning (RL), ...
research
12/21/2021

Soft Actor-Critic with Cross-Entropy Policy Optimization

Soft Actor-Critic (SAC) is one of the state-of-the-art off-policy reinfo...
research
07/24/2020

Evolve To Control: Evolution-based Soft Actor-Critic for Scalable Reinforcement Learning

Advances in Reinforcement Learning (RL) have successfully tackled sample...
research
03/10/2023

Understanding the Synergies between Quality-Diversity and Deep Reinforcement Learning

The synergies between Quality-Diversity (QD) and Deep Reinforcement Lear...

Please sign up or login with your details

Forgot password? Click here to reset