Robust Opponent Modeling via Adversarial Ensemble Reinforcement Learning in Asymmetric Imperfect-Information Games

09/18/2019
by   Macheng Shen, et al.
0

This paper presents an algorithmic framework for learning robust policies in asymmetric imperfect-information games, where the joint reward could depend on the uncertain opponent type (a private information known only to the opponent itself and its ally). In order to maximize the reward, the protagonist agent has to infer the opponent type through agent modeling. We use multiagent reinforcement learning (MARL) to learn opponent models through self-play, which captures the full strategy interaction and reasoning between agents. However, agent policies learned from self-play can suffer from mutual overfitting. Ensemble training methods can be used to improve the robustness of agent policy against different opponents, but it also significantly increases the computational overhead. In order to achieve a good trade-off between the robustness of the learned policy and the computation complexity, we propose to train a separate opponent policy against the protagonist agent for evaluation purposes. The reward achieved by this opponent is a noisy measure of the robustness of the protagonist agent policy due to the intrinsic stochastic nature of a reinforcement learner. To handle this stochasticity, we apply a stochastic optimization scheme to dynamically update the opponent ensemble to optimize an objective function that strikes a balance between robustness and computation complexity. We empirically show that, under the same limited computational budget, the proposed method results in more robust policy learning than standard ensemble training.

READ FULL TEXT
research
02/26/2018

Modeling Others using Oneself in Multi-Agent Reinforcement Learning

We consider the multi-agent reinforcement learning setting with imperfec...
research
08/06/2018

Learning to Share and Hide Intentions using Information Regularization

Learning to cooperate with friends and compete with foes is a key compon...
research
03/20/2022

Does DQN really learn? Exploring adversarial training schemes in Pong

In this work, we study two self-play training schemes, Chainer and Pool,...
research
09/13/2020

Efficient Competitive Self-Play Policy Optimization

Reinforcement learning from self-play has recently reported many success...
research
10/24/2022

IDRL: Identifying Identities in Multi-Agent Reinforcement Learning with Ambiguous Identities

Multi-agent reinforcement learning(MARL) is a prevalent learning paradig...
research
08/14/2020

Joint Policy Search for Multi-agent Collaboration with Imperfect Information

To learn good joint policies for multi-agent collaboration with imperfec...
research
02/11/2015

Off-Policy Reward Shaping with Ensembles

Potential-based reward shaping (PBRS) is an effective and popular techni...

Please sign up or login with your details

Forgot password? Click here to reset