DEFT: Diverse Ensembles for Fast Transfer in Reinforcement Learning

09/26/2022
by   Simeon Adebola, et al.
1

Deep ensembles have been shown to extend the positive effect seen in typical ensemble learning to neural networks and to reinforcement learning (RL). However, there is still much to be done to improve the efficiency of such ensemble models. In this work, we present Diverse Ensembles for Fast Transfer in RL (DEFT), a new ensemble-based method for reinforcement learning in highly multimodal environments and improved transfer to unseen environments. The algorithm is broken down into two main phases: training of ensemble members, and synthesis (or fine-tuning) of the ensemble members into a policy that works in a new environment. The first phase of the algorithm involves training regular policy gradient or actor-critic agents in parallel but adding a term to the loss that encourages these policies to differ from each other. This causes the individual unimodal agents to explore the space of optimal policies and capture more of the multimodality of the environment than a single actor could. The second phase of DEFT involves synthesizing the component policies into a new policy that works well in a modified environment in one of two ways. To evaluate the performance of DEFT, we start with a base version of the Proximal Policy Optimization (PPO) algorithm and extend it with the modifications for DEFT. Our results show that the pretraining phase is effective in producing diverse policies in multimodal environments. DEFT often converges to a high reward significantly faster than alternatives, such as random initialization without DEFT and fine-tuning of ensemble members. While there is certainly more work to be done to analyze DEFT theoretically and extend it to be even more robust, we believe it provides a strong framework for capturing multimodality in environments while still using RL methods with simple policy representations.

READ FULL TEXT

page 7

page 14

research
07/09/2020

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

Model-free deep reinforcement learning (RL) has been successful in a ran...
research
05/21/2014

Off-Policy Shaping Ensembles in Reinforcement Learning

Recent advances of gradient temporal-difference methods allow to learn o...
research
05/23/2023

Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement Learning

Training generally capable agents that perform well in unseen dynamic en...
research
09/29/2022

Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning

Reinforcement learning is a promising paradigm for learning robot contro...
research
11/05/2020

Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

Quality-Diversity (QD) is a concept from Neuroevolution with some intrig...
research
03/13/2023

Reinforcement Learning-based Wavefront Sensorless Adaptive Optics Approaches for Satellite-to-Ground Laser Communication

Optical satellite-to-ground communication (OSGC) has the potential to im...
research
05/23/2022

POLTER: Policy Trajectory Ensemble Regularization for Unsupervised Reinforcement Learning

The goal of Unsupervised Reinforcement Learning (URL) is to find a rewar...

Please sign up or login with your details

Forgot password? Click here to reset