Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons

02/26/2022
by   Chengchun Shi, et al.
0

We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health applications with a pre-collected offline dataset remain unknown. The aim of this paper is to develop a novel advantage learning framework in order to efficiently use pre-collected data for policy optimization. The proposed method takes an optimal Q-estimator computed by any existing state-of-the-art RL algorithms as input, and outputs a new policy whose value is guaranteed to converge at a faster rate than the policy derived based on the initial Q-estimator. Extensive numerical experiments are conducted to back up our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2022

Reinforcement Learning in Possibly Nonstationary Environments

We consider reinforcement learning (RL) methods in offline nonstationary...
research
06/12/2021

A Minimalist Approach to Offline Reinforcement Learning

Offline reinforcement learning (RL) defines the task of learning from a ...
research
12/29/2022

Offline Policy Optimization in RL with Variance Regularizaton

Learning policies from fixed offline datasets is a key challenge to scal...
research
12/13/2022

A Review of Off-Policy Evaluation in Reinforcement Learning

Reinforcement learning (RL) is one of the most vibrant research frontier...
research
11/29/2021

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation

This paper considers how to complement offline reinforcement learning (R...
research
02/23/2021

MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning

In many contemporary applications such as healthcare, finance, robotics,...
research
09/10/2021

Projected State-action Balancing Weights for Offline Reinforcement Learning

Offline policy evaluation (OPE) is considered a fundamental and challeng...

Please sign up or login with your details

Forgot password? Click here to reset