MBDP: A Model-based Approach to Achieve both Robustness and Sample Efficiency via Double Dropout Planning

08/03/2021
by   Wanpeng Zhang, et al.
0

Model-based reinforcement learning is a widely accepted solution for solving excessive sample demands. However, the predictions of the dynamics models are often not accurate enough, and the resulting bias may incur catastrophic decisions due to insufficient robustness. Therefore, it is highly desired to investigate how to improve the robustness of model-based RL algorithms while maintaining high sampling efficiency. In this paper, we propose Model-Based Double-dropout Planning (MBDP) to balance robustness and efficiency. MBDP consists of two kinds of dropout mechanisms, where the rollout-dropout aims to improve the robustness with a small cost of sample efficiency, while the model-dropout is designed to compensate for the lost efficiency at a slight expense of robustness. By combining them in a complementary way, MBDP provides a flexible control mechanism to meet different demands of robustness and efficiency by tuning two corresponding dropout ratios. The effectiveness of MBDP is demonstrated both theoretically and experimentally.

READ FULL TEXT

page 8

page 17

research
08/26/2021

Robust Model-based Reinforcement Learning for Autonomous Greenhouse Control

Due to the high efficiency and less weather dependency, autonomous green...
research
02/26/2022

Dropout can Simulate Exponential Number of Models for Sample Selection Techniques

Following Coteaching, generally in the literature, two models are used i...
research
09/20/2023

Practical Probabilistic Model-based Deep Reinforcement Learning by Integrating Dropout Uncertainty and Trajectory Sampling

This paper addresses the prediction stability, prediction accuracy and c...
research
10/05/2021

Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Randomized ensemble double Q-learning (REDQ) has recently achieved state...
research
08/06/2019

Self-Balanced Dropout

Dropout is known as an effective way to reduce overfitting via preventin...
research
07/19/2020

Beyond Prioritized Replay: Sampling States in Model-Based RL via Simulated Priorities

Model-based reinforcement learning (MBRL) can significantly improve samp...
research
11/01/2019

Kinetic foundation of the zero-inflated negative binomial model for single-cell RNA sequencing data

Single-cell RNA sequencing data have complex features such as dropout ev...

Please sign up or login with your details

Forgot password? Click here to reset