Exploiting Action Impact Regularity and Partially Known Models for Offline Reinforcement Learning

11/15/2021
by   Vincent Liu, et al.
0

Offline reinforcement learning-learning a policy from a batch of data-is known to be hard: without making strong assumptions, it is easy to construct counterexamples such that existing algorithms fail. In this work, we instead consider a property of certain real world problems where offline reinforcement learning should be effective: those where actions only have limited impact for a part of the state. We formalize and introduce this Action Impact Regularity (AIR) property. We further propose an algorithm that assumes and exploits the AIR property, and bound the suboptimality of the output policy when the MDP satisfies AIR. Finally, we demonstrate that our algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in two simulated environments where the regularity holds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2019

Benchmarking Batch Deep Reinforcement Learning Algorithms

Widely-used deep reinforcement learning algorithms have been shown to fa...
research
11/14/2020

PLAS: Latent Action Space for Offline Reinforcement Learning

The goal of offline reinforcement learning is to learn a policy from a f...
research
06/09/2022

Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

Offline reinforcement learning has shown great promise in leveraging lar...
research
10/14/2022

Adaptable Claim Rewriting with Offline Reinforcement Learning for Effective Misinformation Discovery

We propose a novel system to help fact-checkers formulate search queries...
research
06/30/2011

Reinforcement Learning for Agents with Many Sensors and Actuators Acting in Categorizable Environments

In this paper, we confront the problem of applying reinforcement learnin...
research
05/19/2022

Data Valuation for Offline Reinforcement Learning

The success of deep reinforcement learning (DRL) hinges on the availabil...
research
02/19/2022

A Regularized Implicit Policy for Offline Reinforcement Learning

Offline reinforcement learning enables learning from a fixed dataset, wi...

Please sign up or login with your details

Forgot password? Click here to reset