Gradient-free Online Learning in Games with Delayed Rewards

06/19/2020
by   Amélie Héliou, et al.
0

Motivated by applications to online advertising and recommender systems, we consider a game-theoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on multi-player games with continuous action spaces, and we examine the long-run behavior of strategic agents that follow a no-regret learning policy (but are otherwise oblivious to the game being played, the objectives of their opponents, etc.). To account for the lack of a consistent stream of information (for instance, rewards can arrive out of order, with an a priori unbounded delay, etc.), we introduce a gradient-free learning policy where payoff information is placed in a priority queue as it arrives. In this general context, we derive new bounds for the agents' regret; furthermore, under a standard diagonal concavity assumption, we show that the induced sequence of play converges to Nash equilibrium with probability 1, even if the delay between choosing an action and receiving the corresponding reward is unbounded.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2020

Multi-Agent Reinforcement Learning in Cournot Games

In this work, we study the interaction of strategic agents in continuous...
research
09/10/2018

Learning in time-varying games

In this paper, we examine the long-term behavior of regret-minimizing ag...
research
04/26/2021

Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium

In game-theoretic learning, several agents are simultaneously following ...
research
05/30/2023

Competing for Shareable Arms in Multi-Player Multi-Armed Bandits

Competitions for shareable and limited resources have long been studied ...
research
05/14/2022

No-regret learning for repeated non-cooperative games with lossy bandits

This paper considers no-regret learning for repeated continuous-kernel g...
research
11/09/2018

Policy Regret in Repeated Games

The notion of policy regret in online learning is a well defined? perfor...

Please sign up or login with your details

Forgot password? Click here to reset