We consider the adversarial linear contextual bandit problem, where the ...
We study the problem of computing an optimal policy of an infinite-horiz...
Existing online learning algorithms for adversarial Markov Decision Proc...
We consider the adversarial linear contextual bandit setting, which allo...
We revisit the problem of learning in two-player zero-sum Markov games,
...
Best-of-both-worlds algorithms for online learning which achieve near-op...
Policy optimization methods are popular reinforcement learning algorithm...
We study reinforcement learning in stochastic path (SP) problems. The go...
Large-scale machine learning systems often involve data distributed acro...
We examine global non-asymptotic convergence properties of policy gradie...
Multi-agent reinforcement learning (MARL) problems are challenging due t...
We develop a model selection approach to tackle reinforcement learning w...
Policy optimization is a widely-used method in reinforcement learning. D...
In this work, we develop linear bandit algorithms that automatically ada...
We propose a black-box reduction that turns a certain reinforcement lear...
We study infinite-horizon discounted two-player zero-sum Markov games, a...
We resolve the long-standing "impossible tuning" issue for the classic e...
We study the stochastic shortest path problem with adversarial costs and...
We develop several new algorithms for learning Markov Decision Processes...
Optimistic Gradient Descent Ascent (OGDA) algorithm for saddle-point
opt...
We develop a new approach to obtaining high probability regret bounds fo...
Recently, model-free reinforcement learning has attracted research atten...
We study a new form of federated learning where the clients train
person...
We revisit the problem of online learning with sleeping experts/bandits:...
We initiate the study of learning in contextual bandits with the help of...
Model-free reinforcement learning is known to be memory and computation
...
We study the variance of the REINFORCE policy gradient estimator in
envi...
We study the problem of efficient online multiclass linear classificatio...
We propose the first contextual bandit algorithm that is parameter-free,...
We study adaptive regret bounds in terms of the variation of the losses ...
We develop the first general semi-bandit algorithm that simultaneously
a...
We study the decades-old problem of online portfolio management and prop...
We develop a novel and generic algorithm for the adversarial multi-armed...