Efficient Performance Bounds for Primal-Dual Reinforcement Learning from Demonstrations

12/28/2021
by   Angeliki Kamoutsi, et al.
0

We consider large-scale Markov decision processes with an unknown cost function and address the problem of learning a policy from a finite set of expert demonstrations. We assume that the learner is not allowed to interact with the expert and has no access to reinforcement signal of any kind. Existing inverse reinforcement learning methods come with strong theoretical guarantees, but are computationally expensive, while state-of-the-art policy optimization algorithms achieve significant empirical success, but are hampered by limited theoretical understanding. To bridge the gap between theory and practice, we introduce a novel bilinear saddle-point framework using Lagrangian duality. The proposed primal-dual viewpoint allows us to develop a model-free provably efficient algorithm through the lens of stochastic convex optimization. The method enjoys the advantages of simplicity of implementation, low memory requirements, and computational and sample complexities independent of the number of states. We further present an equivalent no-regret online-learning interpretation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2021

Stochastic convex optimization for provably efficient apprenticeship learning

We consider large-scale Markov decision processes (MDPs) with an unknown...
research
06/12/2023

Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes

Constrained Markov Decision Processes (CMDPs) are one of the common ways...
research
12/28/2020

Blackwell Online Learning for Markov Decision Processes

This work provides a novel interpretation of Markov Decision Processes (...
research
04/17/2018

Regret Bounds for Model-Free Linear Quadratic Control

Model-free approaches for reinforcement learning (RL) and continuous con...
research
09/13/2020

Oracle-Efficient Reinforcement Learning in Factored MDPs with Unknown Structure

We consider provably-efficient reinforcement learning (RL) in non-episod...
research
04/27/2018

Scalable Bilinear π Learning Using State and Action Features

Approximate linear programming (ALP) represents one of the major algorit...
research
02/21/2022

Accelerating Primal-dual Methods for Regularized Markov Decision Processes

Entropy regularized Markov decision processes have been widely used in r...

Please sign up or login with your details

Forgot password? Click here to reset