DeepAI AI Chat
Log In Sign Up

Offline Policy Optimization with Eligible Actions

07/01/2022
by   Yao Liu, et al.
0

Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation, and such estimators typically do not require assumptions on the properties and representational capabilities of value function or decision process model function classes. In this paper, we identify an important overfitting phenomenon in optimizing the importance weighted return, in which it may be possible for the learned policy to essentially avoid making aligned decisions for part of the initial state space. We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint, and provide a theoretical justification of the proposed algorithm. We also show the limitations of previous attempts to this approach. We test our algorithm in a healthcare-inspired simulator, a logged dataset collected from real hospitals and continuous control tasks. These experiments show the proposed method yields less overfitting and better test performance compared to state-of-the-art batch reinforcement learning algorithms.

READ FULL TEXT

page 5

page 21

12/26/2020

POPO: Pessimistic Offline Policy Optimization

Offline reinforcement learning (RL), also known as batch RL, aims to opt...
06/27/2020

Overfitting and Optimization in Offline Policy Learning

We consider the task of policy learning from an offline dataset generate...
02/23/2022

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Offline Reinforcement Learning (RL) aims to learn policies from previous...
10/12/2021

Offline Reinforcement Learning with Implicit Q-Learning

Offline reinforcement learning requires reconciling two conflicting aims...
09/17/2018

Policy Optimization via Importance Sampling

Policy optimization is an effective reinforcement learning approach to s...
02/15/2023

Deep Offline Reinforcement Learning for Real-World Treatment Optimization Applications

There is increasing interest in data-driven approaches for dynamically c...
11/16/2020

Blind Decision Making: Reinforcement Learning with Delayed Observations

Reinforcement learning typically assumes that the state update from the ...