Overcoming Model Bias for Robust Offline Deep Reinforcement Learning

08/12/2020
by   Phillip Swazinna, et al.
0

State-of-the-art reinforcement learning algorithms mostly rely on being allowed to directly interact with their environment to collect millions of observations. This makes it hard to transfer their success to industrial control problems, where simulations are often very costly or do not exist at all. Furthermore, interacting with (and especially exploring in) the real, physical environment has the potential to lead to catastrophic events. We thus propose a novel model-based RL algorithm, called MOOSE (MOdel-based Offline policy Search with Ensembles) which can train a policy from a pre-existing, fixed dataset. It ensures that dynamics models are able to accurately assess policy performance by constraining the policy to stay within the support of the data. We design MOOSE deliberately similar to state-of-the-art model-free, offline (a.k.a. batch) RL algorithms BEAR and BCQ, with the main difference being that our algorithm is model-based. We compare the algorithms on the Industrial Benchmark and Mujoco continuous control tasks in terms of robust performance and find that MOOSE almost always outperforms its model-free counterparts by far.

READ FULL TEXT
research
10/01/2021

Offline Reinforcement Learning with Reverse Model-based Imagination

In offline reinforcement learning (offline RL), one of the main challeng...
research
01/14/2022

Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning

Offline reinforcement learning (RL) Algorithms are often designed with e...
research
10/26/2021

Learning Robust Controllers Via Probabilistic Model-Based Policy Search

Model-based Reinforcement Learning estimates the true environment throug...
research
02/13/2022

Goal Recognition as Reinforcement Learning

Most approaches for goal recognition rely on specifications of the possi...
research
05/20/2017

Batch Reinforcement Learning on the Industrial Benchmark: First Experiences

The Particle Swarm Optimization Policy (PSO-P) has been recently introdu...
research
10/06/2022

Deep Inventory Management

We present a Deep Reinforcement Learning approach to solving a periodic ...
research
05/15/2022

Reliable Offline Model-based Optimization for Industrial Process Control

In the research area of offline model-based optimization, novel and prom...

Please sign up or login with your details

Forgot password? Click here to reset