
-
Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously
In this work, we develop linear bandit algorithms that automatically ada...
read it
-
Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach
We propose a black-box reduction that turns a certain reinforcement lear...
read it
-
Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case
We make significant progress toward the stochastic shortest path problem...
read it
-
Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games
We study infinite-horizon discounted two-player zero-sum Markov games, a...
read it
-
Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications
We resolve the long-standing "impossible tuning" issue for the classic e...
read it
-
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
We study the stochastic shortest path problem with adversarial costs and...
read it
-
Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation
We develop several new algorithms for learning Markov Decision Processes...
read it
-
Comparator-adaptive Convex Bandits
We study bandit convex optimization methods that adapt to the norm of th...
read it
-
Active Online Domain Adaptation
Online machine learning systems need to adapt to domain shifts. Meanwhil...
read it
-
Open Problem: Model Selection for Contextual Bandits
In statistical learning, algorithms for model selection allow the learne...
read it
-
Linear Last-iterate Convergence for Matrix Games and Stochastic Games
Optimistic Gradient Descent Ascent (OGDA) algorithm for saddle-point opt...
read it
-
Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs
We develop a new approach to obtaining high probability regret bounds fo...
read it
-
Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
This work studies the problem of learning episodic Markov Decision Proce...
read it
-
A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret
Recently, model-free reinforcement learning has attracted research atten...
read it
-
Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds
We revisit the problem of online learning with sleeping experts/bandits:...
read it
-
Taking a hint: How to leverage loss predictors in contextual bandits?
We initiate the study of learning in contextual bandits with the help of...
read it
-
A Closer Look at Small-loss Bounds for Bandits with Graph Feedback
We study small-loss bounds for the adversarial multi-armed bandits probl...
read it
-
Fair Contextual Multi-Armed Bandits: Theory and Experiments
When an AI system interacts with multiple users, it frequently needs to ...
read it
-
Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
We consider the problem of learning in episodic finite-horizon Markov de...
read it
-
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes
Model-free reinforcement learning is known to be memory and computation ...
read it
-
Model selection for contextual bandits
We introduce the problem of model selection for contextual bandits, wher...
read it
-
Equipping Experts/Bandits with Long-term Memory
We propose the first reduction-based approach to obtaining long-term mem...
read it
-
Hypothesis Set Stability and Generalization
We present an extensive study of generalization for data-dependent hypot...
read it
-
A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free
We propose the first contextual bandit algorithm that is parameter-free,...
read it
-
Improved Path-length Regret Bounds for Bandits
We study adaptive regret bounds in terms of the variation of the losses ...
read it
-
Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously
We develop the first general semi-bandit algorithm that simultaneously a...
read it
-
Efficient Online Portfolio with Logarithmic Regret
We study the decades-old problem of online portfolio management and prop...
read it
-
Logistic Regression: The Importance of Being Improper
Learning linear predictors with the logistic loss---both in stochastic a...
read it
-
Practical Contextual Bandits with Regression Oracles
A major challenge in contextual bandits is to design general-purpose alg...
read it
-
More Adaptive Algorithms for Adversarial Bandits
We develop a novel and generic algorithm for the adversarial multi-armed...
read it
-
Efficient Contextual Bandits in Non-stationary Worlds
Most contextual bandit algorithms minimize regret to the best fixed poli...
read it
-
Corralling a Band of Bandit Algorithms
We study the problem of combining multiple bandit algorithms (that is, o...
read it