DeepAI AI Chat
Log In Sign Up

Regularized Soft Actor-Critic for Behavior Transfer Learning

09/27/2022
by   Mingxi Tan, et al.
0

Existing imitation learning methods mainly focus on making an agent effectively mimic a demonstrated behavior, but do not address the potential contradiction between the behavior style and the objective of a task. There is a general lack of efficient methods that allow an agent to partially imitate a demonstrated behavior to varying degrees, while completing the main objective of a task. In this paper we propose a method called Regularized Soft Actor-Critic which formulates the main task and the imitation task under the Constrained Markov Decision Process framework (CMDP). The main task is defined as the maximum entropy objective used in Soft Actor-Critic (SAC) and the imitation task is defined as a constraint. We evaluate our method on continuous control tasks relevant to video games applications.

READ FULL TEXT
03/01/2023

The Point to Which Soft Actor-Critic Converges

Soft actor-critic is a successful successor over soft Q-learning. While ...
12/31/2021

Actor Loss of Soft Actor Critic Explained

This technical report is devoted to explaining how the actor loss of sof...
11/15/2018

Seq2Seq Mimic Games: A Signaling Perspective

We study the emergence of communication in multiagent adversarial settin...
06/14/2018

Self-Imitation Learning

This paper proposes Self-Imitation Learning (SIL), a simple off-policy a...
09/21/2022

Revisiting Discrete Soft Actor-Critic

We study the adaption of soft actor-critic (SAC) from continuous action ...
07/03/2020

Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient

Exploration-exploitation dilemma has long been a crucial issue in reinfo...
09/06/2018

Sample-Efficient Imitation Learning via Generative Adversarial Nets

Recent work in imitation learning articulate their formulation around th...