Log In Sign Up

The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition

by   Diego Pérez-Liébana, et al.

Learning in multi-agent scenarios is a fruitful research direction, but current approaches still show scalability problems in multiple games with general reward settings and different opponent types. The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) competition is a new challenge that proposes research in this domain using multiple 3D games. The goal of this contest is to foster research in general agents that can learn across different games and opponent types, proposing a challenge as a milestone in the direction of Artificial General Intelligence.


page 1

page 2

page 3

page 4


Arena: A General Evaluation Platform and Building Toolkit for Multi-Agent Intelligence

Learning agents that are not only capable of taking tests but also innov...

Extended Markov Games to Learn Multiple Tasks in Multi-Agent Reinforcement Learning

The combination of Formal Methods with Reinforcement Learning (RL) has r...

Learning to Deceive in Multi-Agent Hidden Role Games

Deception is prevalent in human social settings. However, studies into t...

Deception in Social Learning: A Multi-Agent Reinforcement Learning Perspective

Within the framework of Multi-Agent Reinforcement Learning, Social Learn...

Cooperation and Competition: Flocking with Evolutionary Multi-Agent Reinforcement Learning

Flocking is a very challenging problem in a multi-agent system; traditio...

PettingZoo: Gym for Multi-Agent Reinforcement Learning

This paper introduces PettingZoo, a library of diverse sets of multi-age...

Battlesnake Challenge: A Multi-agent Reinforcement Learning Playground with Human-in-the-loop

We present the Battlesnake Challenge, a framework for multi-agent reinfo...

Code Repositories


Multi Agent Reinforcement Learning using MalmÖ

view repo



view repo

1 Introduction

Learning in multi-agent settings is one of the fundamental problems in AI research. Independently learning agents can result in non-stationarity, and the presence of adversarial agents can hamper exploration and consequently the learning progress. Multi-agent settings can be approached as Stochastic -player Games (SGs) Shapley (1953), where each player interacts with the game environment by observing state observations, sending actions that in turn affect the state of the environment (and other players) and receiving rewards. Reinforcement Learning (RL) is a common approach to learning in SGs Tan (1993); Littman (1994) and promises solutions that could be general and applicable to learning in any game.

RL for multi-agent settings has a long and fruitful research tradition Stone and Veloso (2000); Busoniu et al. (2006). The goal of a reinforcement learner is formally to maximize its long-term cumulative reward. Games can be competitive (e.g., zero-sum games where one player’s reward is the inverse of its opponent’s), collaborative (all rewards are shared), or general-reward. The latter are the most realistic for many real-world applications but also notoriously challenging. Even in more restricted purely competitive and collaborative settings, the challenges of learning in the presence of other agents are far from solved. Current solutions only scale empirically in tasks restricted to relatively small environments or with simplifying assumptions.

Recent research progress in multi-agent RL have shown rapid progress in tackling some key challenges Foerster et al. (2017b, a); Lowe et al. (2017); Sunehag et al. (2017), suggesting that increased research efforts could lead to further breakthroughs. Genralization beyond individual tasks and opponent types is an area with high need and potential for further research. In single agent RL, there is a clear risk to overfitting to individual tasks and specific opponent types. This problem is gradually being addressed in single-agent tasks, but most current empirical work in multi-agent RL is focused on few individual tasks with a single learning agent.

2 MARLÖ: Multi-Agent Reinforcement Learning in MalmÖ

The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) competition is a new challenge that proposes research on multi-agent RL using multiple games. Participants would create learning agents that will be able to play multiple 3D games within Minecraft as defined in the MalmÖ platform. The MARLÖ competition will run for the first time in 2018 and will feature games111

Mob Chase: A collaborative game in which two or more agents and a mob wander a small meadow for a limited amount of time. The agents can either catch the mob - cornering it, leaving no escape path available - or leave the pen through one of the exits. Capturing the mob gives players a high reward ( point) while exiting provides a smaller one (). This game is inspired by the variant of the stag hunt presented in Yoshida et al. (2008).

Build Battle: A competitive and collaborative game where two teams of agents compete to build a given cuboid structure within a time limit. Agents receive points for correctly placing a block or removing an incorrectly placed block, and points for incorrectly placing a block or removing a correctly placed block.

Treasure Hunt: A competitive and collaborative game played in an underground dungeon. Each team is formed by collectors (who can pick treasures) and fighters (who fight foes). The goal is to retrieve a treasure while surviving the enemy entities. All agents on a winning team receive points if their collector gets the treasure, and points if the collector reaches the exit (losing teams receive the negation of these points). If anybody in the team dies, all agents on the team receive points. The game ends when the collector player reaches the exit, when a player dies or when the time runs out.

All games are parameterizable, providing a task space in which potentially endless variants of each domain can be created. Examples of these parameters are weather, block types, number and position of entities, and size of the playing area. Participants can train their agents in any of the possible instances of each game (known as tasks). The final evaluation will be performed in particular customizations designed by the organizers and not revealed before the competition deadline.

The participants will be provided an extensive starter kit222, with all necessary instructions to download the framework, develop and execute their agents locally for testing in the games included in the benchmark. The starter kit will also include a set of simple tasks and the code for default challenge agents (to compete against, but also as examples for participants to develop their own entries). This approach has been proven successful in previous competitions Perez-Liebana et al. (2016); Laboratory (2017); Research (2017), helping entrants to quickly get started with the challenge and easily continue to make iterative changes to their approach.

The final rankings of the competition are computed by means of a play-off tournament. Each stage in the tournament features the games mentioned here, at least task for each one of them and teams playing a round-robin league. The agents must therefore show proficiency in multiple games to move to the next round of the play-off. Each league will have its own ranking table, in which entries are sorted by the sum of scores obtained in all tasks. The top entries of each group progress to the next stage, until reaching the final league, which determines the competition winner.

3 Conclusions

The MARLÖ competition aims at encouraging research in multi-agent reinforcement learning. In particular, it proposes a challenge in which researchers must attempt approaches that generalize well across games, tasks and opponent types. Not only submitted agents are tested in different games, but these games are also highly parameterizable, with the final configuration of tasks not known to participants beforehand. Additionally, all entries play against multiple agents in the tournament, requiring the agents not to overfit to their opponents.

Research is therefore encouraged and supported by this competition in multiple ways: it makes a set of multi-agent learning tasks publicly available, with a low barrier of entry, but enough room for increasing task difficulty as progress is made. It also creates shared baselines and an evaluation setup that eases comparison between approaches. Finally, it increases awareness of the multi-agent RL challenges and provides a platform for testing and sharing the progress in the field.

We believe that this is the right way at the right time to invigorate the research community and drive progress in this exciting and important area. The future of the competition will bring different games with new dynamics, richer interactions and further challenges for multi-agent RL reserach.