Target Entropy Annealing for Discrete Soft Actor-Critic

12/06/2021
by   Yaosheng Xu, et al.
2

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings. It uses the maximum entropy framework for efficiency and stability, and applies a heuristic temperature Lagrange term to tune the temperature α, which determines how "soft" the policy should be. It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains. In this paper we investigate the possible explanations for this phenomenon and propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC. Target entropy is a constant in the temperature Lagrange term and represents the target policy entropy in discrete SAC. We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2019

Soft Actor-Critic for Discrete Action Settings

Soft Actor-Critic is a state-of-the-art reinforcement learning algorithm...
research
11/28/2021

Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as So...
research
09/21/2022

Revisiting Discrete Soft Actor-Critic

We study the adaption of soft actor-critic (SAC) from continuous action ...
research
07/03/2020

Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient

Exploration-exploitation dilemma has long been a crucial issue in reinfo...
research
05/19/2023

Regularization of Soft Actor-Critic Algorithms with Automatic Temperature Adjustment

This work presents a comprehensive analysis to regularize the Soft Actor...
research
03/08/2023

Soft Actor-Critic Algorithm with Truly Inequality Constraint

Soft actor-critic (SAC) in reinforcement learning is expected to be one ...
research
02/07/2022

Soft Actor-Critic with Inhibitory Networks for Faster Retraining

Reusing previously trained models is critical in deep reinforcement lear...

Please sign up or login with your details

Forgot password? Click here to reset