Model-free policy evaluation in Reinforcement Learning via upper solutions

05/05/2021
by   D. Belomestny, et al.
0

In this work we present an approach for building tight model-free confidence intervals for the optimal value function V^⋆ in general infinite horizon MDPs via the upper solutions. We suggest a novel upper value iterative procedure (UVIP) to construct upper solutions for a given agent's policy. UVIP leads to a model free method of policy evaluation. We analyze convergence properties of the approximate UVIP under rather general assumptions and illustrate its performance on a number of benchmark RL problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2020

Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings

Reinforcement learning is a general technique that allows an agent to le...
research
08/23/2020

Learning Off-Policy with Online Planning

We propose Learning Off-Policy with Online Planning (LOOP), combining th...
research
04/08/2022

Approximate discounting-free policy evaluation from transient and recurrent states

In order to distinguish policies that prescribe good from bad actions in...
research
05/24/2023

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

A crucial problem in reinforcement learning is learning the optimal poli...
research
03/24/2022

Bellman Residual Orthogonalization for Offline Reinforcement Learning

We introduce a new reinforcement learning principle that approximates th...
research
02/23/2021

Greedy Multi-step Off-Policy Reinforcement Learning

Multi-step off-policy reinforcement learning has achieved great success....
research
04/17/2019

Robust Exploration with Tight Bayesian Plausibility Sets

Optimism about the poorly understood states and actions is the main driv...

Please sign up or login with your details

Forgot password? Click here to reset