Dialog policies, which determine a system's action based on the current ...
In this paper, we address the problem of computing equilibria in monoton...
Learning in games considers how multiple agents maximize their own rewar...
Bandit algorithms for online learning to rank (OLTR) problems often aim ...
Repeated games consider a situation where multiple agents are motivated ...
Modern recommender systems are hedged with various requirements, such as...
The theory of learning in games is prominent in the AI community, motiva...
In this study, we consider a variant of the Follow the Regularized Leade...
Policy gradient (PG) is a reinforcement learning (RL) approach that opti...
This paper considers the capacity expansion problem in two-sided matchin...
Off-policy evaluation (OPE) is the problem of estimating the value of a
...
In this paper, we revisit sparse stochastic contextual linear bandits. I...
Off-policy evaluation (OPE) is the problem of evaluating new policies us...
The aim of black-box optimization is to optimize an objective function w...