Offline Policy Optimization with Eligible Actions

07/01/2022
by   Yao Liu, et al.
0

Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation, and such estimators typically do not require assumptions on the properties and representational capabilities of value function or decision process model function classes. In this paper, we identify an important overfitting phenomenon in optimizing the importance weighted return, in which it may be possible for the learned policy to essentially avoid making aligned decisions for part of the initial state space. We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint, and provide a theoretical justification of the proposed algorithm. We also show the limitations of previous attempts to this approach. We test our algorithm in a healthcare-inspired simulator, a logged dataset collected from real hospitals and continuous control tasks. These experiments show the proposed method yields less overfitting and better test performance compared to state-of-the-art batch reinforcement learning algorithms.

READ FULL TEXT

page 5

page 21

research
12/26/2020

POPO: Pessimistic Offline Policy Optimization

Offline reinforcement learning (RL), also known as batch RL, aims to opt...
research
06/27/2020

Overfitting and Optimization in Offline Policy Learning

We consider the task of policy learning from an offline dataset generate...
research
10/12/2021

Offline Reinforcement Learning with Implicit Q-Learning

Offline reinforcement learning requires reconciling two conflicting aims...
research
05/24/2023

Collaborative World Models: An Online-Offline Transfer RL Approach

Training visual reinforcement learning (RL) models in offline datasets i...
research
09/17/2018

Policy Optimization via Importance Sampling

Policy optimization is an effective reinforcement learning approach to s...
research
09/04/2023

Marginalized Importance Sampling for Off-Environment Policy Evaluation

Reinforcement Learning (RL) methods are typically sample-inefficient, ma...
research
11/16/2020

Blind Decision Making: Reinforcement Learning with Delayed Observations

Reinforcement learning typically assumes that the state update from the ...

Please sign up or login with your details

Forgot password? Click here to reset