Trusted Approximate Policy Iteration with Bisimulation Metrics

02/06/2022
by   Mete Kemertas, et al.
0

Bisimulation metrics define a distance measure between states of a Markov decision process (MDP) based on a comparison of reward sequences. Due to this property they provide theoretical guarantees in value function approximation. In this work we first prove that bisimulation metrics can be defined via any p-Wasserstein metric for p≥ 1. Then we describe an approximate policy iteration (API) procedure that uses ϵ-aggregation with π-bisimulation and prove performance bounds for continuous state spaces. We bound the difference between π-bisimulation metrics in terms of the change in the policies themselves. Based on these theoretical results, we design an API(α) procedure that employs conservative policy updates and enjoys better performance bounds than the naive API approach. In addition, we propose a novel trust region approach which circumvents the requirement to explicitly solve a constrained optimization problem. Finally, we provide experimental evidence of improved stability compared to non-conservative alternatives in simulated continuous control.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2010

Dynamic Policy Programming

In this paper, we propose a novel policy iteration method, called dynami...
research
05/12/2014

Approximate Policy Iteration Schemes: A Comparison

We consider the infinite-horizon discounted optimal control problem form...
research
07/31/2020

Queueing Network Controls via Deep Reinforcement Learning

Novel advanced policy gradient (APG) methods with conservative policy it...
research
04/20/2013

Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

We consider approximate dynamic programming for the infinite-horizon sta...
research
12/07/2022

Tight Performance Guarantees of Imitator Policies with Continuous Actions

Behavioral Cloning (BC) aims at learning a policy that mimics the behavi...
research
10/27/2021

Towards Robust Bisimulation Metric Learning

Learned representations in deep reinforcement learning (DRL) have to ext...
research
05/28/2018

Dual Policy Iteration

Recently, a novel class of Approximate Policy Iteration (API) algorithms...

Please sign up or login with your details

Forgot password? Click here to reset