Soft Actor-Critic Algorithm with Truly Inequality Constraint

03/08/2023
by   Taisuke Kobayashi, et al.
0

Soft actor-critic (SAC) in reinforcement learning is expected to be one of the next-generation robot control schemes. Its ability to maximize policy entropy would make a robotic controller robust to noise and perturbation, which is useful for real-world robot applications. However, the priority of maximizing the policy entropy is automatically tuned in the current implementation, the rule of which can be interpreted as one for equality constraint, binding the policy entropy into its specified target value. The current SAC is therefore no longer maximize the policy entropy, contrary to our expectation. To resolve this issue in SAC, this paper improves its implementation with a slack variable for appropriately handling the inequality constraint to maximize the policy entropy. In Mujoco and Pybullet simulators, the modified SAC achieved the higher robustness and the more stable learning than before while regularizing the norm of action. In addition, a real-robot variable impedance task was demonstrated for showing the applicability of the modified SAC to real-world robot control.

READ FULL TEXT
research
12/13/2018

Soft Actor-Critic Algorithms and Applications

Model-free deep reinforcement learning (RL) algorithms have been success...
research
12/21/2021

Soft Actor-Critic with Cross-Entropy Policy Optimization

Soft Actor-Critic (SAC) is one of the state-of-the-art off-policy reinfo...
research
07/02/2019

Modified Actor-Critics

Robot Learning, from a control point of view, often involves continuous ...
research
02/28/2017

Bridging the Gap Between Value and Policy Based Reinforcement Learning

We establish a new connection between value and policy based reinforceme...
research
12/06/2021

Target Entropy Annealing for Discrete Soft Actor-Critic

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in ...
research
07/03/2020

Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient

Exploration-exploitation dilemma has long been a crucial issue in reinfo...
research
06/19/2020

Band-limited Soft Actor Critic Model

Soft Actor Critic (SAC) algorithms show remarkable performance in comple...

Please sign up or login with your details

Forgot password? Click here to reset