Projected State-action Balancing Weights for Offline Reinforcement Learning

09/10/2021
by   Jiayi Wang, et al.
1

Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement learning (RL). This paper focuses on the value estimation of a target policy based on pre-collected data generated from a possibly different policy, under the framework of infinite-horizon Markov decision processes. Motivated by the recently developed marginal importance sampling method in RL and the covariate balancing idea in causal inference, we propose a novel estimator with approximately projected state-action balancing weights for the policy value estimation. We obtain the convergence rate of these weights, and show that the proposed value estimator is semi-parametric efficient under technical conditions. In terms of asymptotics, our results scale with both the number of trajectories and the number of decision points at each trajectory. As such, consistency can still be achieved with a limited number of subjects when the number of decision points diverges. In addition, we make a first attempt towards characterizing the difficulty of OPE problems, which may be of independent interest. Numerical experiments demonstrate the promising performance of our proposed estimator.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2022

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

We study offline reinforcement learning (RL) in partially observable Mar...
research
06/08/2019

Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling

Motivated by the many real-world applications of reinforcement learning ...
research
09/18/2022

Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes

We study the offline reinforcement learning (RL) in the face of unmeasur...
research
12/13/2019

More Efficient Off-Policy Evaluation through Regularized Targeted Learning

We study the problem of off-policy evaluation (OPE) in Reinforcement Lea...
research
12/29/2022

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Off-policy evaluation (OPE) is a method for estimating the return of a t...
research
02/26/2022

Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons

We consider reinforcement learning (RL) methods in offline domains witho...
research
07/27/2020

Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders

Off-policy evaluation (OPE) in reinforcement learning is an important pr...

Please sign up or login with your details

Forgot password? Click here to reset