Model-Ensemble Trust-Region Policy Optimization

02/28/2018
by   Thanard Kurutach, et al.
0

Model-free reinforcement learning (RL) methods are succeeding in a growing number of tasks, aided by recent advances in deep learning. However, they tend to suffer from high sample complexity, which hinders their use in real-world domains. Alternatively, model-based reinforcement learning promises to reduce sample complexity, but tends to require careful tuning and to date have succeeded mainly in restrictive domains where simple models are sufficient for learning. In this paper, we analyze the behavior of vanilla model-based reinforcement learning methods when deep neural networks are used to learn both the model and the policy, and show that the learned policy tends to exploit regions where insufficient data is available for the model to be learned, causing instability in training. To overcome this issue, we propose to use an ensemble of models to maintain the model uncertainty and regularize the learning process. We further show that the use of likelihood ratio derivatives yields much more stable learning than backpropagation through time. Altogether, our approach Model-Ensemble Trust-Region Policy Optimization (ME-TRPO) significantly reduces the sample complexity compared to model-free deep RL methods on challenging continuous control benchmark tasks.

READ FULL TEXT
research
02/28/2018

Model-Based Value Estimation for Efficient Model-Free Reinforcement Learning

Recent model-free reinforcement learning algorithms have proposed incorp...
research
03/02/2016

Continuous Deep Q-Learning with Model-based Acceleration

Model-free reinforcement learning has been successfully applied to a ran...
research
04/19/2023

Sample-efficient Model-based Reinforcement Learning for Quantum Control

We propose a model-based reinforcement learning (RL) approach for noisy ...
research
01/15/2020

SEERL: Sample Efficient Ensemble Reinforcement Learning

Ensemble learning is a very prevalent method employed in machine learnin...
research
12/19/2020

Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach

In order for reinforcement learning techniques to be useful in real-worl...
research
07/05/2021

Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation

Model-based deep reinforcement learning has achieved success in various ...
research
03/07/2019

When random search is not enough: Sample-Efficient and Noise-Robust Blackbox Optimization of RL Policies

Interest in derivative-free optimization (DFO) and "evolutionary strateg...

Please sign up or login with your details

Forgot password? Click here to reset