Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

06/05/2020
by   Tatsuya Matsushima, et al.
10

Most reinforcement learning (RL) algorithms assume online access to the environment, in which one may readily interleave updates to the policy with experience collection using that policy. However, in many real-world applications such as health, education, dialogue agents, and robotics, the cost or potential risk of deploying a new data-collection policy is high, to the point that it can become prohibitive to update the data-collection policy more than a few times during learning. With this view, we propose a novel concept of deployment efficiency, measuring the number of distinct data-collection policies that are used during policy learning. We observe that naïvely applying existing model-free offline RL algorithms recursively does not lead to a practical deployment-efficient and sample-efficient algorithm. We propose a novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN) that can effectively optimize a policy offline using 10-20 times fewer data than prior works. Furthermore, the recursive application of BREMEN is able to achieve impressive deployment efficiency while maintaining the same or better sample efficiency, learning successful policies from scratch on simulated robotic environments with only 5-10 deployments, compared to typical values of hundreds to millions in standard RL baselines.

READ FULL TEXT

page 13

page 18

research
02/23/2021

MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch Optimization for Deployment Constrained Reinforcement Learning

In many contemporary applications such as healthcare, finance, robotics,...
research
01/25/2022

MOORe: Model-based Offline-to-Online Reinforcement Learning

With the success of offline reinforcement learning (RL), offline trained...
research
12/01/2022

Launchpad: Learning to Schedule Using Offline and Online RL Methods

Deep reinforcement learning algorithms have succeeded in several challen...
research
09/04/2023

Marginalized Importance Sampling for Off-Environment Policy Evaluation

Reinforcement Learning (RL) methods are typically sample-inefficient, ma...
research
07/17/2021

Hierarchical Reinforcement Learning with Optimal Level Synchronization based on a Deep Generative Model

The high-dimensional or sparse reward task of a reinforcement learning (...
research
04/24/2023

Policy Resilience to Environment Poisoning Attacks on Reinforcement Learning

This paper investigates policy resilience to training-environment poison...
research
11/29/2021

Robust On-Policy Data Collection for Data-Efficient Policy Evaluation

This paper considers how to complement offline reinforcement learning (R...

Please sign up or login with your details

Forgot password? Click here to reset