Provably Efficient Online Agnostic Learning in Markov Games

by   Yi Tian, et al.

We study online agnostic learning, a problem that arises in episodic multi-agent reinforcement learning where the actions of the opponents are unobservable. We show that in this challenging setting, achieving sublinear regret against the best response in hindsight is statistically hard. We then consider a weaker notion of regret, and present an algorithm that achieves after K episodes a sublinear 𝒊Ėƒ(K^3/4) regret. This is the first sublinear regret bound (to our knowledge) in the online agnostic setting. Importantly, our regret bound is independent of the size of the opponents' action spaces. As a result, even when the opponents' actions are fully observable, our regret bound improves upon existing analysis (e.g., (Xie et al., 2020)) by an exponential factor in the number of opponents.


page 1

page 2

page 3

page 4

∙ 06/27/2012

Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret

Online learning algorithms are designed to learn even when their input i...
∙ 09/08/2021

Learning Zero-sum Stochastic Games with Posterior Sampling

In this paper, we propose Posterior Sampling Reinforcement Learning for ...
∙ 03/11/2023

No-regret Algorithms for Fair Resource Allocation

We consider a fair resource allocation problem in the no-regret setting ...
∙ 05/01/2023

The Impact of the Geometric Properties of the Constraint Set in Safe Optimization with Bandit Feedback

We consider a safe optimization problem with bandit feedback in which an...
∙ 11/14/2018

Incentivizing Exploration with Unbiased Histories

In a social learning setting, there is a set of actions, each of which h...
∙ 02/19/2021

Learning to Persuade on the Fly: Robustness Against Ignorance

We study a repeated persuasion setting between a sender and a receiver, ...
∙ 08/16/2023

Robust Bayesian Satisficing

Distributional shifts pose a significant challenge to achieving robustne...

Please sign up or login with your details

Forgot password? Click here to reset