Dynamic decision making under distributional shifts is of fundamental
in...
Recently, several studies consider the stochastic optimization problem b...
We consider a reinforcement learning setting in which the deployment
env...
We consider the stochastic optimization problem with smooth but not
nece...
Traditional analyses for non-convex stochastic optimization problems
cha...
As a framework for sequential decision-making, Reinforcement Learning (R...
With the advent and increasing consolidation of e-commerce, digital
adve...
Among the reasons hindering reinforcement learning (RL) applications to
...
Preconditioning has been a staple technique in optimization and machine
...
We consider a stochastic lost-sales inventory control system with a lead...
The digitization of the economy has witnessed an explosive growth of
ava...
Off-policy evaluation and learning (OPE/L) use offline observational dat...
We consider online no-regret learning in unknown games with bandit feedb...
Many hierarchical reinforcement learning (RL) applications have empirica...
One of the most widely used methods for solving large-scale stochastic
o...
Learning optimal policies from historical data enables the gains from
pe...
Consider a player that in each round t out of T rounds chooses an action...
The optimization problems associated with training generative adversaria...
To balance exploration and exploitation, multi-armed bandit algorithms n...
We design a simple reinforcement learning agent that, with a specificati...
In many multi-agent reinforcement learning applications such as flocking...
We study the problem of dynamic batch learning in high-dimensional spars...
Federated learning (FL) learns a model jointly from a set of participati...
First-price auctions have very recently swept the online advertising
ind...
Motivated by applications to online advertising and recommender systems,...
Policy learning using historical observational data is an important prob...
Most of reinforcement learning (RL) algorithms aim at maximizing the
exp...
We study the sequential batch learning problem in linear contextual band...
We study online learning in repeated first-price auctions with censored
...
Diagonal preconditioning has been a staple technique in optimization and...
We study an offline multi-action policy learning algorithm based on doub...
In this paper, we consider online learning in generalized linear context...
We consider multi-agent learning via online gradient descent (OGD) in a ...
We establish that an optimistic variant of Q-learning applied to a
finit...
Contextual bandit algorithms are sensitive to the estimation method of t...
In many settings, a decision-maker wishes to learn a rule, or policy, th...
Recent studies have discovered that deep networks are capable of memoriz...
We consider a Markovian many server queueing system in which customers a...
We consider a Markovian single server queue in which customers are
preem...