Reward function shape exploration in adversarial imitation learning: an empirical study

04/14/2021
by   Yawei Wang, et al.
0

For adversarial imitation learning algorithms (AILs), no true rewards are obtained from the environment for learning the strategy. However, the pseudo rewards based on the output of the discriminator are still required. Given the implicit reward bias problem in AILs, we design several representative reward function shapes and compare their performances by large-scale experiments. To ensure our results' reliability, we conduct the experiments on a series of Mujoco and Box2D continuous control tasks based on four different AILs. Besides, we also compare the performance of various reward function shapes using varying numbers of expert trajectories. The empirical results reveal that the positive logarithmic reward function works well in typical continuous control tasks. In contrast, the so-called unbiased reward function is limited to specific kinds of tasks. Furthermore, several designed reward functions perform excellently in these environments as well.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2021

Hyperparameter Selection for Imitation Learning

We address the issue of tuning hyperparameters (HPs) for imitation learn...
research
06/05/2020

Wasserstein Distance guided Adversarial Imitation Learning with Reward Shape Exploration

The generative adversarial imitation learning (GAIL) has provided an adv...
research
06/01/2022

Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble

Inverse reinforcement learning (IRL) recovers the underlying reward func...
research
09/20/2020

Addressing reward bias in Adversarial Imitation Learning with neutral reward functions

Generative Adversarial Imitation Learning suffers from the fundamental p...
research
09/23/2020

What is the Reward for Handwriting? – Handwriting Generation by Imitation Learning

Analyzing the handwriting generation process is an important issue and h...
research
09/27/2021

Learning Multimodal Rewards from Rankings

Learning from human feedback has shown to be a useful approach in acquir...
research
05/12/2023

Selective imitation on the basis of reward function similarity

Imitation is a key component of human social behavior, and is widely use...

Please sign up or login with your details

Forgot password? Click here to reset