DeepAI AI Chat
Log In Sign Up

Diff-DAC: Distributed Actor-Critic for Multitask Deep Reinforcement Learning

by   Sergio Valcarcel Macua, et al.

We propose a multiagent distributed actor-critic algorithm for multitask reinforcement learning (MRL), named Diff-DAC. The agents are connected, forming a (possibly sparse) network. Each agent is assigned a task and has access to data from this local task only. During the learning process, the agents are able to communicate some parameters to their neighbors. Since the agents incorporate their neighbors' parameters into their own learning rules, the information is diffused across the network, and they can learn a common policy that generalizes well across all tasks. Diff-DAC is scalable since the computational complexity and communication overhead per agent grow with the number of neighbors, rather than with the total number of agents. Moreover, the algorithm is fully distributed in the sense that agents self-organize, with no need for coordinator node. Diff-DAC follows an actor-critic scheme where the value function and the policy are approximated with deep neural networks, being able to learn expressive policies from raw data. As a by-product of Diff-DAC's derivation from duality theory, we provide novel insights into the standard actor-critic framework, showing that it is actually an instance of the dual ascent method to approximate the solution of a linear program. Experiments illustrate the performance of the algorithm in the cart-pole, inverted pendulum, and swing-up cart-pole environments.


Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning

We propose a fully distributed actor-critic algorithm approximated by de...

Seq2Seq Mimic Games: A Signaling Perspective

We study the emergence of communication in multiagent adversarial settin...

Guided Deep Reinforcement Learning for Swarm Systems

In this paper, we investigate how to learn to control a group of coopera...

A nonlinear hidden layer enables actor-critic agents to learn multiple paired association navigation

Navigation to multiple cued reward locations has been increasingly used ...

An actor-critic algorithm with deep double recurrent agents to solve the job shop scheduling problem

There is a growing interest in integrating machine learning techniques a...

Using Distributed Reinforcement Learning for Resource Orchestration in a Network Slicing Scenario

The Network Slicing (NS) paradigm enables the partition of physical and ...

Modular Multitask Reinforcement Learning with Policy Sketches

We describe a framework for multitask deep reinforcement learning guided...