Tight Performance Guarantees of Imitator Policies with Continuous Actions

12/07/2022
by   Davide Maran, et al.
0

Behavioral Cloning (BC) aims at learning a policy that mimics the behavior demonstrated by an expert. The current theoretical understanding of BC is limited to the case of finite actions. In this paper, we study BC with the goal of providing theoretical guarantees on the performance of the imitator policy in the case of continuous actions. We start by deriving a novel bound on the performance gap based on Wasserstein distance, applicable for continuous-action experts, holding under the assumption that the value function is Lipschitz continuous. Since this latter condition is hardy fulfilled in practice, even for Lipschitz Markov Decision Processes and policies, we propose a relaxed setting, proving that value function is always Holder continuous. This result is of independent interest and allows obtaining in BC a general bound for the performance of the imitator policy. Finally, we analyze noise injection, a common practice in which the expert action is executed in the environment after the application of a noise kernel. We show that this practice allows deriving stronger performance guarantees, at the price of a bias due to the noise addition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2018

Bayes-CPACE: PAC Optimal Exploration in Continuous Space Bayes-Adaptive Markov Decision Processes

We present the first PAC optimal algorithm for Bayes-Adaptive Markov Dec...
research
04/19/2018

Lipschitz Continuity in Model-based Reinforcement Learning

Model-based reinforcement-learning methods learn transition and reward m...
research
06/03/2020

Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes

We propose a principled kernel-based policy iteration algorithm to solve...
research
02/06/2022

Trusted Approximate Policy Iteration with Bisimulation Metrics

Bisimulation metrics define a distance measure between states of a Marko...
research
02/05/2019

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

We study contextual bandit learning with an abstract policy class and co...
research
06/06/2023

Finding Counterfactually Optimal Action Sequences in Continuous State Spaces

Humans performing tasks that involve taking a series of multiple depende...

Please sign up or login with your details

Forgot password? Click here to reset