Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

02/19/2020
by   Noah Y. Siegel, et al.
7

Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can be acquired. This property makes these algorithms appealing for real world problems such as robot control. In practice, however, standard off-policy algorithms fail in the batch setting for continuous control. In this paper, we propose a simple solution to this problem. It admits the use of data generated by arbitrary behavior policies and uses a learned prior – the advantage-weighted behavior model (ABM) – to bias the RL policy towards actions that have previously been executed and are likely to be successful on the new task. Our method can be seen as an extension of recent work on batch-RL that enables stable learning from conflicting data-sources. We find improvements on competitive baselines in a variety of RL tasks – including standard continuous control benchmarks and multi-task learning for simulated and real-world robots.

READ FULL TEXT

page 7

page 8

research
02/18/2021

Continuous Doubly Constrained Batch Reinforcement Learning

Reliant on too many experiments to learn good actions, current Reinforce...
research
10/03/2019

Benchmarking Batch Deep Reinforcement Learning Algorithms

Widely-used deep reinforcement learning algorithms have been shown to fa...
research
10/01/2019

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

In this paper, we aim to develop a simple and scalable reinforcement lea...
research
03/20/2019

Batch Policy Learning under Constraints

When learning policies for real-world domains, two important questions a...
research
05/05/2023

Knowledge Transfer from Teachers to Learners in Growing-Batch Reinforcement Learning

Standard approaches to sequential decision-making exploit an agent's abi...
research
12/16/2020

Batch-Constrained Distributional Reinforcement Learning for Session-based Recommendation

Most of the existing deep reinforcement learning (RL) approaches for ses...
research
09/30/2022

B2RL: An open-source Dataset for Building Batch Reinforcement Learning

Batch reinforcement learning (BRL) is an emerging research area in the R...

Please sign up or login with your details

Forgot password? Click here to reset