R-MADDPG for Partially Observable Environments and Limited Communication

02/16/2020
by   Rose E. Wang, et al.
1

There are several real-world tasks that would ben-efit from applying multiagent reinforcement learn-ing (MARL) algorithms, including the coordina-tion among self-driving cars. The real world haschallenging conditions for multiagent learningsystems, such as its partial observable and nonsta-tionary nature. Moreover, if agents must share alimited resource (e.g. network bandwidth) theymust all learn how to coordinate resource use.(Hochreiter Schmidhuber, 1997) This paper in-troduces a deep recurrent multiagent actor-criticframework (R-MADDPG) for handling multia-gent coordination under partial observable set-tings and limited communication. We investigaterecurrency effects on performance and commu-nication use of a team of agents. We demon-strate that the resulting framework learns time-dependencies for sharing missing observations,handling resource limitations, and developing dif-ferent communication patterns among agents.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset