-
The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition
Learning in multi-agent scenarios is a fruitful research direction, but ...
read it
-
Emergent Complexity via Multi-Agent Competition
Reinforcement learning algorithms can train agents that solve problems i...
read it
-
Emergent Coordination Through Competition
We study the emergence of cooperative behaviors in reinforcement learnin...
read it
-
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
We propose a unified mechanism for achieving coordination and communicat...
read it
-
Noisy Agents: Self-supervised Exploration by Predicting Auditory Events
Humans integrate multiple sensory modalities (e.g. visual and audio) to ...
read it
-
Emergent Escape-based Flocking Behavior using Multi-Agent Reinforcement Learning
In nature, flocking or swarm behavior is observed in many species as it ...
read it
-
Causal Curiosity: RL Agents Discovering Self-supervised Experiments for Causal Representation Learning
Humans show an innate ability to learn the regularities of the world thr...
read it
Emergent Tool Use From Multi-Agent Autocurricula
Through multi-agent competition, the simple objective of hide-and-seek, and standard reinforcement learning algorithms at scale, we find that agents create a self-supervised autocurriculum inducing multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination. We find clear evidence of six emergent phases in agent strategy in our environment, each of which creates a new pressure for the opposing team to adapt; for instance, agents learn to build multi-object shelters using moveable boxes which in turn leads to agents discovering that they can overcome obstacles using ramps. We further provide evidence that multi-agent competition may scale better with increasing environment complexity and leads to behavior that centers around far more human-relevant skills than other self-supervised reinforcement learning methods such as intrinsic motivation. Finally, we propose transfer and fine-tuning as a way to quantitatively evaluate targeted capabilities, and we compare hide-and-seek agents to both intrinsic motivation and random initialization baselines in a suite of domain-specific intelligence tests.
READ FULL TEXT
Comments
There are no comments yet.