Pessimistic Model Selection for Offline Deep Reinforcement Learning

11/29/2021
by   Chao-Han Huck Yang, et al.
0

Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications. Despite its promising performance, practical gaps exist when deploying DRL in real-world scenarios. One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL. In particular, for offline DRL with observational data, model selection is a challenging task as there is no ground truth available for performance demonstration, in contrast with the online setting with simulated environments. In this work, we propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee, which features a provably effective framework for finding the best policy among a set of candidate models. Two refined approaches are also proposed to address the potential bias of DRL model in identifying the optimal policy. Numerical studies demonstrated the superior performance of our approach over existing methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2022

A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning

Cost-effective asset management is an area of interest across several in...
research
10/15/2020

Applicability and Challenges of Deep Reinforcement Learning for Satellite Frequency Plan Design

The study and benchmarking of Deep Reinforcement Learning (DRL) models h...
research
02/22/2022

Multi-fidelity reinforcement learning framework for shape optimization

Deep reinforcement learning (DRL) is a promising outer-loop intelligence...
research
02/18/2021

Causal Inference Q-Network: Toward Resilient Reinforcement Learning

Deep reinforcement learning (DRL) has demonstrated impressive performanc...
research
09/06/2023

ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning

Data is a critical asset in AI, as high-quality datasets can significant...
research
11/28/2022

Causal Deep Reinforcement Learning using Observational Data

Deep reinforcement learning (DRL) requires the collection of plenty of i...
research
06/22/2020

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data

Empowered by expressive function approximators such as neural networks, ...

Please sign up or login with your details

Forgot password? Click here to reset