DeepAI AI Chat
Log In Sign Up

Discriminator Soft Actor Critic without Extrinsic Rewards

by   Daichi Nishio, et al.
Kanazawa University

It is difficult to be able to imitate well in unknown states from a small amount of expert data and sampling data. Supervised learning methods such as Behavioral Cloning do not require sampling data, but usually suffer from distribution shift. The methods based on reinforcement learning, such as inverse reinforcement learning and generative adversarial imitation learning (GAIL), can learn from only a few expert data. However, they often need to interact with the environment. Soft Q imitation learning addressed the problems, and it was shown that it could learn efficiently by combining Behavioral Cloning and soft Q-learning with constant rewards. In order to make this algorithm more robust to distribution shift, We propose Discriminator Soft Actor Critic (DSAC). It uses a reward function based on adversarial inverse reinforcement learning instead of constant rewards. We evaluated it on PyBullet environments with only four expert trajectories.


Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation

We consider the problem of imitation learning from a finite set of exper...

IQ-Learn: Inverse soft-Q Learning for Imitation

In many sequential decision-making problems (e.g., robotics control, gam...

SQIL: Imitation Learning via Regularized Behavioral Cloning

Learning to imitate expert behavior given action demonstrations containi...

Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference

Recent advances in reinforcement learning have inspired increasing inter...

Imitation learning based on entropy-regularized forward and inverse reinforcement learning

This paper proposes Entropy-Regularized Imitation Learning (ERIL), which...

Profitable Strategy Design by Using Deep Reinforcement Learning for Trades on Cryptocurrency Markets

Deep Reinforcement Learning solutions have been applied to different con...

Inspiration Learning through Preferences

Current imitation learning techniques are too restrictive because they r...