A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

06/08/2020
by   Sihan Zeng, et al.
4

We develop a mathematical framework for solving multi-task reinforcement learning problems based on a type of decentralized policy gradient method. The goal in multi-task reinforcement learning is to learn a common policy that operates effectively in different environments; these environments have similar (or overlapping) state and action spaces, but have different rewards and dynamics. Agents immersed in each of these environments communicate with other agents by sharing their models (i.e. their policy parameterizations) but not their state/reward paths. Our analysis provides a convergence rate for a consensus-based distributed, entropy-regularized policy gradient method for finding such a policy. We demonstrate the effectiveness of the proposed method using a series of numerical experiments. These experiments range from small-scale "Grid World" problems that readily demonstrate the trade-offs involved in multi-task learning to large-scale problems, where common policies are learned to play multiple Atari games or to navigate an airborne drone in multiple (simulated) environments.

READ FULL TEXT

page 28

page 29

research
02/27/2018

DiGrad: Multi-Task Reinforcement Learning with Shared Actions

Most reinforcement learning algorithms are inefficient for learning mult...
research
06/18/2020

Cooperative Multi-Agent Reinforcement Learning with Partial Observations

In this paper, we propose a distributed zeroth-order policy optimization...
research
06/02/2018

Efficient Entropy for Policy Gradient with Multidimensional Action Space

In recent years, deep reinforcement learning has been shown to be adept ...
research
11/16/2017

Hindsight policy gradients

Goal-conditional policies allow reinforcement learning agents to pursue ...
research
02/18/2014

Off-Policy General Value Functions to Represent Dynamic Role Assignments in RoboCup 3D Soccer Simulation

Collecting and maintaining accurate world knowledge in a dynamic, comple...
research
02/10/2017

Batch Policy Gradient Methods for Improving Neural Conversation Models

We study reinforcement learning of chatbots with recurrent neural networ...
research
03/11/2021

Multi-Task Federated Reinforcement Learning with Adversaries

Reinforcement learning algorithms, just like any other Machine learning ...

Please sign up or login with your details

Forgot password? Click here to reset