Learning Stackelberg Equilibria and Applications to Economic Design Games

by   Gianluca Brero, et al.

We study the use of reinforcement learning to learn the optimal leader's strategy in Stackelberg games. Learning a leader's strategy has an innate stationarity problem – when optimizing the leader's strategy, the followers' strategies might shift. To circumvent this problem, we model the followers via no-regret dynamics to converge to a Bayesian Coarse-Correlated Equilibrium (B-CCE) of the game induced by the leader. We then embed the followers' no-regret dynamics in the leader's learning environment, which allows us to formulate our learning problem as a standard POMDP. We prove that the optimal policy of this POMDP achieves the same utility as the optimal leader's strategy in our Stackelberg game. We solve this POMDP using actor-critic methods, where the critic is given access to the joint information of all the agents. Finally, we show that our methods are able to learn optimal leader strategies in a variety of settings of increasing complexity, including indirect mechanisms where the leader's strategy is setting up the mechanism's rules.


Please sign up or login with your details

Forgot password? Click here to reset