Deep Jump Q-Evaluation for Offline Policy Evaluation in Continuous Action Space

10/29/2020
by   Hengrui Cai, et al.
0

We consider off-policy evaluation (OPE) in continuous action domains, such as dynamic pricing and personalized dose finding. In OPE, one aims to learn the value under a new policy using historical data generated by a different behavior policy. Most existing works on OPE focus on discrete action domains. To handle continuous action space, we develop a brand-new deep jump Q-evaluation method for OPE. The key ingredient of our method lies in adaptively discretizing the action space using deep jump Q-learning. This allows us to apply existing OPE methods in discrete domains to handle continuous actions. Our method is further justified by theoretical results, synthetic and real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/10/2018

Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space

Most existing deep reinforcement learning (DRL) frameworks consider eith...
research
05/24/2019

Semi-Parametric Efficient Policy Learning with Continuous Actions

We consider off-policy evaluation and optimization with continuous actio...
research
01/29/2019

Discretizing Continuous Action Space for On-Policy Optimization

In this work, we show that discretizing action space for continuous cont...
research
05/24/2018

A0C: Alpha Zero in Continuous Action Space

A core novelty of Alpha Zero is the interleaving of tree search and deep...
research
10/10/2019

Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies

We consider off-policy policy evaluation when the trajectory data are ge...
research
06/06/2022

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

We consider the off-policy evaluation problem of reinforcement learning ...
research
06/10/2020

Marginal Utility for Planning in Continuous or Large Discrete Action Spaces

Sample-based planning is a powerful family of algorithms for generating ...

Please sign up or login with your details

Forgot password? Click here to reset