Intrinsic fluctuations of reinforcement learning promote cooperation

09/01/2022
by   Wolfram Barfuss, et al.
0

In this work, we ask for and answer what makes classical reinforcement learning cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. Specifically, we consider the widely used temporal-difference reinforcement learning algorithm with epsilon-greedy exploration in the classic environment of an iterated Prisoner's dilemma with one-period memory. Each of the two learning agents learns a strategy that conditions the following action choices on both agents' action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80%. Thus, inherent noise is not a necessary evil of the iterative learning process. It is a critical asset for the learning of cooperation. However, we also point out the trade-off between a high likelihood of cooperative behavior and achieving this in a reasonable amount of time. Our findings are relevant for purposefully designing cooperative algorithms and regulating undesired collusive effects.

READ FULL TEXT
research
10/19/2021

Improved cooperation by balancing exploration and exploitation in intertemporal social dilemma tasks

When an individual's behavior has rational characteristics, this may lea...
research
02/15/2021

Cooperation and Reputation Dynamics with Reinforcement Learning

Creating incentives for cooperation is a challenge in natural and artifi...
research
06/05/2019

Escaping the State of Nature: A Hobbesian Approach to Cooperation in Multi-agent Reinforcement Learning

Cooperation is a phenomenon that has been widely studied across many dif...
research
10/05/2021

A study of first-passage time minimization via Q-learning in heated gridworlds

Optimization of first-passage times is required in applications ranging ...
research
11/24/2022

On the Emergence of Cooperation in the Repeated Prisoner's Dilemma

Using simulations between pairs of ϵ-greedy q-learners with one-period m...
research
06/20/2023

Coevolution of cognition and cooperation in structured populations under reinforcement learning

We study the evolution of behavior under reinforcement learning in a Pri...
research
09/15/2021

Evolutionary Reinforcement Learning Dynamics with Irreducible Environmental Uncertainty

In this work we derive and present evolutionary reinforcement learning d...

Please sign up or login with your details

Forgot password? Click here to reset