Player-optimal Stable Regret for Bandit Learning in Matching Markets

07/20/2023
by   Fang Kong, et al.
0

The problem of matching markets has been studied for a long time in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players' least-preferred stable matching. However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players' profits, player-optimal stable matching would be the most desirable. Though <cit.> successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players' preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm named explore-then-Gale-Shapley (ETGS) and show that the optimal stable regret of each player can be upper bounded by O(Klog T/Δ^2) where K is the number of arms, T is the horizon and Δ is the players' minimum preference gap among the first N+1-ranked arms. This result significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2020

Bandit Learning in Decentralized Matching Markets

We study two-sided matching markets in which one side of the market (the...
research
04/26/2022

Thompson Sampling for Bandit Learning in Matching Markets

The problem of two-sided matching markets has a wide range of real-world...
research
10/21/2022

Competing Bandits in Time Varying Matching Markets

We study the problem of online learning in two-sided non-stationary matc...
research
06/26/2020

Dominate or Delete: Decentralized Competing Bandits with Uniform Valuation

We study regret minimization problems in a two-sided matching market whe...
research
02/18/2020

The Complexity of Interactively Learning a Stable Matching by Trial and Error

In a stable matching setting, we consider a query model that allows for ...
research
03/12/2021

Beyond log^2(T) Regret for Decentralized Bandits in Matching Markets

We design decentralized algorithms for regret minimization in the two-si...
research
05/07/2022

Rate-Optimal Contextual Online Matching Bandit

Two-sided online matching platforms have been employed in various market...

Please sign up or login with your details

Forgot password? Click here to reset