Hyperparameter Selection for Imitation Learning

05/25/2021
by   Léonard Hussenot, et al.
7

We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms in the context of continuous-control, when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, but this is not a realistic setting. Indeed, would this reward function be available, it could then directly be used for policy training and imitation would not be necessary. To tackle this mostly ignored problem, we propose a number of possible proxies to the external reward. We evaluate them in an extensive empirical study (more than 10'000 agents across 9 environments) and make practical recommendations for selecting HPs. Our results show that while imitation learning algorithms are sensitive to HP choices, it is often possible to select good enough HPs through a proxy to the reward function.

READ FULL TEXT

page 4

page 5

page 6

page 7

page 8

page 9

page 15

page 16

research
04/14/2021

Reward function shape exploration in adversarial imitation learning: an empirical study

For adversarial imitation learning algorithms (AILs), no true rewards ar...
research
09/20/2020

Addressing reward bias in Adversarial Imitation Learning with neutral reward functions

Generative Adversarial Imitation Learning suffers from the fundamental p...
research
09/23/2020

What is the Reward for Handwriting? – Handwriting Generation by Imitation Learning

Analyzing the handwriting generation process is an important issue and h...
research
12/02/2020

DERAIL: Diagnostic Environments for Reward And Imitation Learning

The objective of many real-world tasks is complex and difficult to proce...
research
11/30/2022

Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation

Reinforcement Learning has emerged as a strong alternative to solve opti...
research
05/12/2023

Selective imitation on the basis of reward function similarity

Imitation is a key component of human social behavior, and is widely use...
research
01/03/2023

Genetic Imitation Learning by Reward Extrapolation

Imitation learning demonstrates remarkable performance in various domain...

Please sign up or login with your details

Forgot password? Click here to reset