Discriminator Soft Actor Critic without Extrinsic Rewards

01/19/2020
by   Daichi Nishio, et al.
0

It is difficult to be able to imitate well in unknown states from a small amount of expert data and sampling data. Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift. The methods based on reinforcement learning, such as inverse reinforcement learning and generative adversarial imitation learning (GAIL), can learn from only a few expert data. However, they often need to interact with the environment. Soft Q imitation learning addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards. In order to make this algorithm more robust to distribution shift, We propose Discriminator Soft Actor Critic (DSAC). It uses a reward function based on adversarial inverse reinforcement learning instead of constant rewards. We evaluated it on PyBullet environments with only four expert trajectories.

READ FULL TEXT
research
05/16/2019

Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation

We consider the problem of imitation learning from a finite set of exper...
research
06/23/2021

IQ-Learn: Inverse soft-Q Learning for Imitation

In many sequential decision-making problems (e.g., robotics control, gam...
research
05/27/2019

SQIL: Imitation Learning via Regularized Behavioral Cloning

Learning to imitate expert behavior given action demonstrations containi...
research
05/03/2021

Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference

Recent advances in reinforcement learning have inspired increasing inter...
research
08/17/2020

Imitation learning based on entropy-regularized forward and inverse reinforcement learning

This paper proposes Entropy-Regularized Imitation Learning (ERIL), which...
research
01/15/2022

Profitable Strategy Design by Using Deep Reinforcement Learning for Trades on Cryptocurrency Markets

Deep Reinforcement Learning solutions have been applied to different con...
research
09/16/2018

Inspiration Learning through Preferences

Current imitation learning techniques are too restrictive because they r...

Please sign up or login with your details

Forgot password? Click here to reset