Meta-Value Learning: a General Framework for Learning with Learning Awareness

07/17/2023
by   Tim Cooijmans, et al.
0

Gradient-based learning in multi-agent systems is difficult because the gradient derives from a first-order model which does not account for the interaction between agents' learning processes. LOLA (arXiv:1709.04326) accounts for this by differentiating through one step of optimization. We propose to judge joint policies by their long-term prospects as measured by the meta-value, a discounted sum over the returns of future optimization iterates. We apply a form of Q-learning to the meta-game of optimization, in a way that avoids the need to explicitly represent the continuous action space of policy updates. The resulting method, MeVa, is consistent and far-sighted, and does not require REINFORCE estimators. We analyze the behavior of our method on a toy game and compare to prior work on repeated matrix games.

READ FULL TEXT
research
09/13/2017

Learning with Opponent-Learning Awareness

Multi-agent settings are quickly gathering importance in machine learnin...
research
12/14/2021

How and Why to Manipulate Your Own Agent

We consider strategic settings where several users engage in a repeated ...
research
05/03/2022

Model-Free Opponent Shaping

In general-sum games, the interaction of self-interested learning agents...
research
08/30/2021

Learning Meta Representations for Agents in Multi-Agent Reinforcement Learning

In multi-agent reinforcement learning, the behaviors that agents learn i...
research
03/08/2022

COLA: Consistent Learning with Opponent-Learning Awareness

Learning in general-sum games can be unstable and often leads to sociall...
research
04/25/2018

Multiagent Soft Q-Learning

Policy gradient methods are often applied to reinforcement learning in c...
research
07/23/2019

E-HBA: Using Action Policies for Expert Advice and Agent Typification

Past research has studied two approaches to utilise predefined policy se...

Please sign up or login with your details

Forgot password? Click here to reset