Soft Actor-Critic with Inhibitory Networks for Faster Retraining

02/07/2022
by   Jaime Ide, et al.
0

Reusing previously trained models is critical in deep reinforcement learning to speed up training of new agents. However, it is unclear how to acquire new skills when objectives and constraints are in conflict with previously learned skills. Moreover, when retraining, there is an intrinsic conflict between exploiting what has already been learned and exploring new skills. In soft actor-critic (SAC) methods, a temperature parameter can be dynamically adjusted to weight the action entropy and balance the explore × exploit trade-off. However, controlling a single coefficient can be challenging within the context of retraining, even more so when goals are contradictory. In this work, inspired by neuroscience research, we propose a novel approach using inhibitory networks to allow separate and adaptive state value evaluations, as well as distinct automatic entropy tuning. Ultimately, our approach allows for controlling inhibition to handle conflict between exploiting less risky, acquired behaviors and exploring novel ones to overcome more challenging tasks. We validate our method through experiments in OpenAI Gym environments.

READ FULL TEXT

page 6

page 8

research
03/01/2023

The Point to Which Soft Actor-Critic Converges

Soft actor-critic is a successful successor over soft Q-learning. While ...
research
02/14/2019

Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning

We propose a new policy iteration theory as an important extension of so...
research
07/03/2020

Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient

Exploration-exploitation dilemma has long been a crucial issue in reinfo...
research
11/25/2021

Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models

A core challenge for an autonomous agent acting in the real world is to ...
research
12/06/2021

Target Entropy Annealing for Discrete Soft Actor-Critic

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in ...
research
11/28/2021

Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as So...

Please sign up or login with your details

Forgot password? Click here to reset