Dominate or Delete: Decentralized Competing Bandits with Uniform Valuation

by   Abishek Sankararaman, et al.

We study regret minimization problems in a two-sided matching market where uniformly valued demand side agents (a.k.a. agents) continuously compete for getting matched with supply side agents (a.k.a. arms) with unknown and heterogeneous valuations. Such markets abstract online matching platforms (for e.g. UpWork, TaskRabbit) and falls within the purview of matching bandit models introduced in Liu et al. <cit.>. The uniform valuation in the demand side admits a unique stable matching equilibrium in the system. We design the first decentralized algorithm - (), for matching bandits under uniform valuation that does not require any knowledge of reward gaps or time horizon, and thus partially resolves an open question in <cit.>. works in phases of exponentially increasing length. In each phase i, an agent first deletes dominated arms – the arms preferred by agents ranked higher than itself. Deletion follows dynamic explore-exploit using UCB algorithm on the remaining arms for 2^i rounds. Finally, the preferred arm is broadcast in a decentralized fashion to other agents through pure exploitation in (N-1)K rounds with N agents and K arms. Comparing the obtained reward with respect to the unique stable matching, we show that achieves O(log(T)/Δ^2) regret in T rounds, where Δ is the minimum gap across all agents and arms. We provide a (orderwise) matching regret lower-bound.


page 1

page 2

page 3

page 4


Beyond log^2(T) Regret for Decentralized Bandits in Matching Markets

We design decentralized algorithms for regret minimization in the two-si...

Decentralized Competing Bandits in Non-Stationary Matching Markets

Understanding complex dynamics of two-sided online matching markets, whe...

Player-optimal Stable Regret for Bandit Learning in Matching Markets

The problem of matching markets has been studied for a long time in the ...

Distributed Bandits with Heterogeneous Agents

This paper tackles a multi-agent bandit setting where M agents cooperate...

UniRank: Unimodal Bandit Algorithm for Online Ranking

We tackle a new emerging problem, which is finding an optimal monopartit...

Unimodal Mono-Partite Matching in a Bandit Setting

We tackle a new emerging problem, which is finding an optimal monopartit...

Which Random Matching Markets Exhibit a Stark Effect of Competition?

We revisit the popular random matching market model introduced by Knuth ...

Please sign up or login with your details

Forgot password? Click here to reset