Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

06/14/2020
by   Zhenghao Peng, et al.
0

Conventional Reinforcement Learning (RL) algorithms usually have one single agent learning to solve the task independently. As a result, the agent can only explore a limited part of the state-action space while the learned behavior is highly correlated to the agent's previous experience, making the training prone to a local minimum. In this work, we empower RL with the capability of teamwork and propose a novel non-local policy optimization framework called Diversity-regularized Collaborative Exploration (DiCE). DiCE utilizes a group of heterogeneous agents to explore the environment simultaneously and share the collected experiences. A regularization mechanism is further designed to maintain the diversity of the team and modulate the exploration. We implement the framework in both on-policy and off-policy settings and the experimental results show that DiCE can achieve substantial improvement over the baselines in the MuJoCo locomotion tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2018

Meta-Reinforcement Learning of Structured Exploration Strategies

Exploration is a fundamental challenge in reinforcement learning (RL). M...
research
06/11/2021

Offline Reinforcement Learning as Anti-Exploration

Offline Reinforcement Learning (RL) aims at learning an optimal control ...
research
05/02/2023

An Autonomous Non-monolithic Agent with Multi-mode Exploration based on Options Framework

Most exploration research on reinforcement learning (RL) has paid attent...
research
05/30/2022

DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems

Muscle-actuated organisms are capable of learning an unparalleled divers...
research
06/12/2020

Human and Multi-Agent collaboration in a human-MARL teaming framework

Collaborative multi-agent reinforcement learning (MARL) as a specific ca...
research
05/19/2020

Experience Augmentation: Boosting and Accelerating Off-Policy Multi-Agent Reinforcement Learning

Exploration of the high-dimensional state action space is one of the big...
research
09/29/2020

Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network

Existing exploration strategies in reinforcement learning (RL) often eit...

Please sign up or login with your details

Forgot password? Click here to reset