Model-Free Robust Reinforcement Learning with Linear Function Approximation

06/20/2020
by   Kishan Panaganti, et al.
0

This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. The goal of the RMDPs framework is to find a policy that is robust against the parameter uncertainties due to the mismatch between the simulator model and real-world settings. We first propose Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation. We prove the convergence of this algorithm using stochastic approximation techniques. We then propose Robust Least Squares Policy Iteration (RLSPI) algorithm for learning the optimal robust policy. We also give a general weighted Euclidean norm bound on the error (closeness to optimality) of the resulting policy. Finally, we demonstrate the performance of our RLSPI algorithm on some benchmark problems from OpenAI Gym.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2020

Robust Reinforcement Learning using Least Squares Policy Iteration

This paper addresses the problem of model-free reinforcement learning fo...
research
06/15/2017

Reinforcement Learning under Model Mismatch

We study reinforcement learning under model misspecification, where we d...
research
07/13/2020

Structured Policy Iteration for Linear Quadratic Regulator

Linear quadratic regulator (LQR) is one of the most popular frameworks t...
research
06/13/2012

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping

We consider the problem of efficiently learning optimal control policies...
research
07/29/2023

First-order Policy Optimization for Robust Policy Evaluation

We adopt a policy optimization viewpoint towards policy evaluation for r...
research
10/21/2020

Logistic Q-Learning

We propose a new reinforcement learning algorithm derived from a regular...
research
05/05/2021

H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi Dispatch

We present H-TD2: Hybrid Temporal Difference Learning for Taxi Dispatch,...

Please sign up or login with your details

Forgot password? Click here to reset