Variance-Aware Off-Policy Evaluation with Linear Function Approximation

06/22/2021
by   Yifei Min, et al.
7

We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy. We propose to incorporate the variance information of the value function to improve the sample efficiency of OPE. More specifically, for time-inhomogeneous episodic linear Markov decision processes (MDPs), we propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration. We show that our algorithm achieves a tighter error bound than the best-known result. We also provide a fine-grained characterization of the distribution shift between the behavior policy and the target policy. Extensive numerical experiments corroborate our theory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2022

Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation....
research
06/12/2022

Geometric Policy Iteration for Markov Decision Processes

Recently discovered polyhedral structures of the value function for fini...
research
03/11/2022

Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism

Offline reinforcement learning, which seeks to utilize offline/historica...
research
08/26/2020

Inverse Policy Evaluation for Value-based Sequential Decision-making

Value-based methods for reinforcement learning lack generally applicable...
research
11/25/2022

Operator Splitting Value Iteration

We introduce new planning and reinforcement learning algorithms for disc...
research
01/08/2021

Average-Reward Off-Policy Policy Evaluation with Function Approximation

We consider off-policy policy evaluation with function approximation (FA...
research
04/05/2023

Conformal Off-Policy Evaluation in Markov Decision Processes

Reinforcement Learning aims at identifying and evaluating efficient cont...

Please sign up or login with your details

Forgot password? Click here to reset