Model-free policy evaluation in Reinforcement Learning via upper solutions

05/05/2021

∙

In this work we present an approach for building tight model-free confidence intervals for the optimal value function V^⋆ in general infinite horizon MDPs via the upper solutions. We suggest a novel upper value iterative procedure (UVIP) to construct upper solutions for a given agent's policy. UVIP leads to a model free method of policy evaluation. We analyze convergence properties of the approximate UVIP under rather general assumptions and illustrate its performance on a number of benchmark RL problems.

READ FULL TEXT

Model-free policy evaluation in Reinforcement Learning via upper solutions

Sign in with Google

Consider DeepAI Pro