BRPO: Batch Residual Policy Optimization

02/08/2020
by   Sungryull Sohn, et al.
11

In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e.g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state. This can cause batch RL to be overly conservative, unable to exploit large policy changes at frequently-visited, high-confidence states without risking poor performance at sparsely-visited states. To remedy this, we propose residual policies, where the allowable deviation of the learned policy is state-action-dependent. We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance. We show that BRPO achieves the state-of-the-art performance in a number of tasks.

READ FULL TEXT
research
07/16/2020

Provably Good Batch Reinforcement Learning Without Great Exploration

Batch reinforcement learning (RL) is important to apply RL algorithms to...
research
12/31/2011

T-Learning

Traditional Reinforcement Learning (RL) has focused on problems involvin...
research
06/11/2023

Policy Regularization with Dataset Constraint for Offline Reinforcement Learning

We consider the problem of learning the best possible policy from a fixe...
research
01/30/2023

STEEL: Singularity-aware Reinforcement Learning

Batch reinforcement learning (RL) aims at finding an optimal policy in a...
research
05/12/2014

Structural Return Maximization for Reinforcement Learning

Batch Reinforcement Learning (RL) algorithms attempt to choose a policy ...
research
11/02/2022

Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints

Offline reinforcement learning (RL) learns policies entirely from static...
research
10/06/2021

Mismatched No More: Joint Model-Policy Optimization for Model-Based RL

Many model-based reinforcement learning (RL) methods follow a similar te...

Please sign up or login with your details

Forgot password? Click here to reset