Apprenticeship Learning via Frank-Wolfe

11/05/2019
by   Tom Zahavy, et al.
0

We consider the applications of the Frank-Wolfe (FW) algorithm for Apprenticeship Learning (AL). In this setting, there is a Markov Decision Process (MDP), but the reward function is not given explicitly. Instead, there is an expert that acts according to some policy, and the goal is to find a policy whose feature expectations are closest to those of the expert policy. We formulate this problem as finding the projection of the feature expectations of the expert on the feature expectations polytope – the convex hull of the feature expectations of all the deterministic policies in the MDP. We show that this formulation is equivalent to the AL objective and that solving this problem using the FW algorithm is equivalent to the most known AL algorithm, the projection method of Abbeel andNg (2004). This insight allows us to analyze AL with tools from the convex optimization literature and to derive tighter bounds on AL. Specifically, we show that a variation of the FW method that is based on taking "away steps" achieves a linear rate of convergence when applied to AL. We also show experimentally that this version outperforms the FW baseline. To the best of our knowledge, this is the first work that shows linear convergence rates for AL.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/13/2021

Online Apprenticeship Learning

In Apprenticeship Learning (AL), we are given a Markov Decision Process ...
research
10/22/2017

Safety-Aware Apprenticeship Learning

Apprenticeship learning (AL) is a class of "learning from demonstrations...
research
01/24/2018

Logically-Correct Reinforcement Learning

We propose a novel Reinforcement Learning (RL) algorithm to synthesize p...
research
02/25/2021

Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally

We study the statistical limits of Imitation Learning (IL) in episodic M...
research
08/18/2020

A Relation Analysis of Markov Decision Process Frameworks

We study the relation between different Markov Decision Process (MDP) fr...
research
06/15/2023

Residual Q-Learning: Offline and Online Policy Customization without Value

Imitation Learning (IL) is a widely used framework for learning imitativ...
research
03/03/2019

How IT allows E-Participation in Policy-Making Process

With the art and practice of government policy-making, public work, and ...

Please sign up or login with your details

Forgot password? Click here to reset