When to Trust Your Simulator: Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning

06/27/2022
by   Haoyi Niu, et al.
4

Learning effective reinforcement learning (RL) policies to solve real-world complex tasks can be quite challenging without a high-fidelity simulation environment. In most cases, we are only given imperfect simulators with simplified dynamics, which inevitably lead to severe sim-to-real gaps in RL policy learning. The recently emerged field of offline RL provides another possibility to learn policies directly from pre-collected historical data. However, to achieve reasonable performance, existing offline RL algorithms need impractically large offline data with sufficient state-action space coverage for training. This brings up a new question: is it possible to combine learning from limited real data in offline RL and unrestricted exploration through imperfect simulators in online RL to address the drawbacks of both approaches? In this study, we propose the Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning (H2O) framework to provide an affirmative answer to this question. H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q function learning on simulated state-action pairs with large dynamics gaps, while also simultaneously allowing learning from a fixed real-world dataset. Through extensive simulation and real-world tasks, as well as theoretical analysis, we demonstrate the superior performance of H2O against other cross-domain online and offline RL algorithms. H2O provides a brand new hybrid offline-and-online RL paradigm, which can potentially shed light on future RL algorithm design for solving practical real-world tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2021

Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning

In real world, affecting the environment by a weak policy can be expensi...
research
03/13/2022

DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement Learning

Offline reinforcement learning algorithms promise to be applicable in se...
research
06/07/2023

Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL

Offline reinforcement learning (RL) offers an appealing approach to real...
research
10/13/2022

Sustainable Online Reinforcement Learning for Auto-bidding

Recently, auto-bidding technique has become an essential tool to increas...
research
09/11/2023

Physics-informed reinforcement learning via probabilistic co-adjustment functions

Reinforcement learning of real-world tasks is very data inefficient, and...
research
12/01/2022

Launchpad: Learning to Schedule Using Offline and Online RL Methods

Deep reinforcement learning algorithms have succeeded in several challen...
research
02/23/2021

DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning

Thermal power generation plays a dominant role in the world's electricit...

Please sign up or login with your details

Forgot password? Click here to reset