BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

10/27/2019
by   Xinyue Chen, et al.
0

The field of Deep Reinforcement Learning (DRL) has recently seen a surge in research in batch reinforcement learning, which aims for sample-efficient learning from a given data set without additional interactions with the environment. In the batch DRL setting, commonly employed off-policy DRL algorithms can perform poorly and sometimes even fail to learn altogether. In this paper, we propose a new algorithm, Best-Action Imitation Learning (BAIL), which unlike many off-policy DRL algorithms does not involve maximizing Q functions over the action space. Striving for simplicity as well as performance, BAIL first selects from the batch the actions it believes to be high-performing actions for their corresponding states; it then uses those state-action pairs to train a policy network using imitation learning. Although BAIL is simple, we demonstrate that BAIL achieves state of the art performance on the Mujoco benchmark.

READ FULL TEXT

page 8

page 17

research
03/31/2020

Augmented Q Imitation Learning (AQIL)

The study of unsupervised learning can be generally divided into two cat...
research
08/06/2020

Deep Reinforcement Learning based Local Planner for UAV Obstacle Avoidance using Demonstration Data

In this paper, a deep reinforcement learning (DRL) method is proposed to...
research
12/07/2021

Adaptive Mimic: Deep Reinforcement Learning of Parameterized Bipedal Walking from Infeasible References

Not until recently, robust robot locomotion has been achieved by deep re...
research
12/23/2020

Rethink AI-based Power Grid Control: Diving Into Algorithm Design

Recently, deep reinforcement learning (DRL)-based approach has shown pro...
research
09/29/2021

Mitigation of Adversarial Policy Imitation via Constrained Randomization of Policy (CRoP)

Deep reinforcement learning (DRL) policies are vulnerable to unauthorize...
research
12/23/2020

Augmenting Policy Learning with Routines Discovered from a Single Demonstration

Humans can abstract prior knowledge from very little data and use it to ...
research
09/14/2021

Towards optimized actions in critical situations of soccer games with deep reinforcement learning

Soccer is a sparse rewarding game: any smart or careless action in criti...

Please sign up or login with your details

Forgot password? Click here to reset