DeepAI AI Chat
Log In Sign Up

Regularized Soft Actor-Critic for Behavior Transfer Learning

by   Mingxi Tan, et al.

Existing imitation learning methods mainly focus on making an agent effectively mimic a demonstrated behavior, but do not address the potential contradiction between the behavior style and the objective of a task. There is a general lack of efficient methods that allow an agent to partially imitate a demonstrated behavior to varying degrees, while completing the main objective of a task. In this paper we propose a method called Regularized Soft Actor-Critic which formulates the main task and the imitation task under the Constrained Markov Decision Process framework (CMDP). The main task is defined as the maximum entropy objective used in Soft Actor-Critic (SAC) and the imitation task is defined as a constraint. We evaluate our method on continuous control tasks relevant to video games applications.


The Point to Which Soft Actor-Critic Converges

Soft actor-critic is a successful successor over soft Q-learning. While ...

Actor Loss of Soft Actor Critic Explained

This technical report is devoted to explaining how the actor loss of sof...

Seq2Seq Mimic Games: A Signaling Perspective

We study the emergence of communication in multiagent adversarial settin...

Self-Imitation Learning

This paper proposes Self-Imitation Learning (SIL), a simple off-policy a...

Revisiting Discrete Soft Actor-Critic

We study the adaption of soft actor-critic (SAC) from continuous action ...

Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient

Exploration-exploitation dilemma has long been a crucial issue in reinfo...

Sample-Efficient Imitation Learning via Generative Adversarial Nets

Recent work in imitation learning articulate their formulation around th...