Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments
Despite single agent deep reinforcement learning has achieved significant success due to the experience replay mechanism, Concerns should be reconsidered in multiagent environments. This work focus on the stochastic cooperative environment. We apply a specific adaptation to one recently proposed weighted double estimator and propose a multiagent deep reinforcement learning framework, named Weighted Double Deep Q-Network (WDDQN). To achieve efficient cooperation, Lenient Reward Network and Mixture Replay Strategy are introduced. By utilizing the deep neural network and the weighted double estimator, WDDQN can not only reduce the bias effectively but also be extended to many deep RL scenarios with only raw pixel images as input. Empirically, the WDDQN outperforms the existing DRL algorithm (double DQN) and multiagent RL algorithm (lenient Q-learning) in terms of performance and convergence within stochastic cooperative environments.
READ FULL TEXT