Double Check Your State Before Trusting It: Confidence-Aware Bidirectional Offline Model-Based Imagination

06/16/2022
by   Jiafei Lyu, et al.
0

The learned policy of model-free offline reinforcement learning (RL) methods is often constrained to stay within the support of datasets to avoid possible dangerous out-of-distribution actions or states, making it challenging to handle out-of-support region. Model-based RL methods offer a richer dataset and benefit generalization by generating imaginary trajectories with either trained forward or reverse dynamics model. However, the imagined transitions may be inaccurate, thus downgrading the performance of the underlying offline RL method. In this paper, we propose to augment the offline dataset by using trained bidirectional dynamics models and rollout policies with double check. We introduce conservatism by trusting samples that the forward model and backward model agree on. Our method, confidence-aware bidirectional offline model-based imagination, generates reliable samples and can be combined with any model-free offline RL method. Experimental results on the D4RL benchmarks demonstrate that our method significantly boosts the performance of existing model-free offline RL algorithms and achieves competitive or better scores against baseline methods.

READ FULL TEXT

page 14

page 15

research
10/01/2021

Offline Reinforcement Learning with Reverse Model-based Imagination

In offline reinforcement learning (offline RL), one of the main challeng...
research
04/10/2023

Uncertainty-driven Trajectory Truncation for Model-based Offline Reinforcement Learning

Equipped with the trained environmental dynamics, model-based offline re...
research
03/07/2023

ENTROPY: Environment Transformer and Offline Policy Optimization

Model-based methods provide an effective approach to offline reinforceme...
research
09/07/2022

Concept-modulated model-based offline reinforcement learning for rapid generalization

The robustness of any machine learning solution is fundamentally bound b...
research
06/16/2021

Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL

Offline Reinforcement Learning (RL) aims to extract near-optimal policie...
research
07/05/2022

Offline RL Policies Should be Trained to be Adaptive

Offline RL algorithms must account for the fact that the dataset they ar...
research
06/01/2022

Model Generation with Provable Coverability for Offline Reinforcement Learning

Model-based offline optimization with dynamics-aware policy provides a n...

Please sign up or login with your details

Forgot password? Click here to reset