Minimax Weight Learning for Absorbing MDPs

01/09/2023
by   Fengyin Li, et al.
0

Reinforcement learning policy evaluation problems are often modeled as finite or discounted/averaged infinite-horizon MDPs. In this paper, we study undiscounted off-policy policy evaluation for absorbing MDPs. Given the dataset consisting of the i.i.d episodes with a given truncation level, we propose a so-called MWLA algorithm to directly estimate the expected return via the importance ratio of the state-action occupancy measure. The Mean Square Error (MSE) bound for the MWLA method is investigated and the dependence of statistical errors on the data size and the truncation level are analyzed. With an episodic taxi environment, computational experiments illustrate the performance of the MWLA algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2020

Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning

We consider the problem of off-policy evaluation for reinforcement learn...
research
06/07/2023

Convergence of SARSA with linear function approximation: The random horizon case

The reinforcement learning algorithm SARSA combined with linear function...
research
02/11/2015

Off-policy evaluation for MDPs with unknown structure

Off-policy learning in dynamic decision problems is essential for provid...
research
02/02/2023

Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms

Many policy-based reinforcement learning (RL) algorithms can be viewed a...
research
05/23/2018

Representation Balancing MDPs for Off-Policy Policy Evaluation

We study the problem of off-policy policy evaluation (OPPE) in RL. In co...
research
08/05/2021

Active Reinforcement Learning over MDPs

The past decade has seen the rapid development of Reinforcement Learning...
research
09/26/2019

Action Selection for MDPs: Anytime AO* vs. UCT

In the presence of non-admissible heuristics, A* and other best-first al...

Please sign up or login with your details

Forgot password? Click here to reset