Logarithmic Regret for Matrix Games against an Adversary with Noisy Bandit Feedback

06/22/2023
by   Arnab Maiti, et al.
0

This paper considers a variant of zero-sum matrix games where at each timestep the row player chooses row i, the column player chooses column j, and the row player receives a noisy reward with mean A_i,j. The objective of the row player is to accumulate as much reward as possible, even against an adversarial column player. If the row player uses the EXP3 strategy, an algorithm known for obtaining √(T) regret against an arbitrary sequence of rewards, it is immediate that the row player also achieves √(T) regret relative to the Nash equilibrium in this game setting. However, partly motivated by the fact that the EXP3 strategy is myopic to the structure of the game, O'Donoghue et al. (2021) proposed a UCB-style algorithm that leverages the game structure and demonstrated that this algorithm greatly outperforms EXP3 empirically. While they showed that this UCB-style algorithm achieved √(T) regret, in this paper we ask if there exists an algorithm that provably achieves polylog(T) regret against any adversary, analogous to results from stochastic bandits. We propose a novel algorithm that answers this question in the affirmative for the simple 2 × 2 setting, providing the first instance-dependent guarantees for games in the regret setting. Our algorithm overcomes two major hurdles: 1) obtaining logarithmic regret even though the Nash equilibrium is estimable only at a 1/√(T) rate, and 2) designing row-player strategies that guarantee that either the adversary provides information about the Nash equilibrium, or the row player incurs negative regret. Moreover, in the full information case we address the general n × m case where the first hurdle is still relevant. Finally, we show that EXP3 and the UCB-based algorithm necessarily cannot perform better than √(T).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2020

Last Round Convergence and No-Instant Regret in Repeated Games with Asymmetric Information

This paper considers repeated games in which one player has more informa...
research
07/22/2020

Exploiting No-Regret Algorithms in System Design

We investigate a repeated two-player zero-sum game setting where the col...
research
04/26/2021

Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium

In game-theoretic learning, several agents are simultaneously following ...
research
08/10/2016

Stochastic Rank-1 Bandits

We propose stochastic rank-1 bandits, a class of online learning problem...
research
03/19/2017

Bernoulli Rank-1 Bandits for Click Feedback

The probability that a user will click a search result depends both on i...
research
01/13/2023

Decentralized model-free reinforcement learning in stochastic games with average-reward objective

We propose the first model-free algorithm that achieves low regret perfo...
research
11/07/2017

Security Strategies of Both Players in Asymmetric Information Zero-Sum Stochastic Games with an Informed Controller

This paper considers a zero-sum two-player asymmetric information stocha...

Please sign up or login with your details

Forgot password? Click here to reset