
Achieving Near InstanceOptimality and MinimaxOptimality in Stochastic and Adversarial Linear Bandits Simultaneously
In this work, we develop linear bandit algorithms that automatically ada...
read it

Nonstationary Reinforcement Learning without Prior Knowledge: An Optimal Blackbox Approach
We propose a blackbox reduction that turns a certain reinforcement lear...
read it

Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case
We make significant progress toward the stochastic shortest path problem...
read it

Lastiterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinitehorizon Competitive Markov Games
We study infinitehorizon discounted twoplayer zerosum Markov games, a...
read it

Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications
We resolve the longstanding "impossible tuning" issue for the classic e...
read it

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
We study the stochastic shortest path problem with adversarial costs and...
read it

Learning Infinitehorizon Averagereward MDPs with Linear Function Approximation
We develop several new algorithms for learning Markov Decision Processes...
read it

Comparatoradaptive Convex Bandits
We study bandit convex optimization methods that adapt to the norm of th...
read it

Active Online Domain Adaptation
Online machine learning systems need to adapt to domain shifts. Meanwhil...
read it

Open Problem: Model Selection for Contextual Bandits
In statistical learning, algorithms for model selection allow the learne...
read it

Linear Lastiterate Convergence for Matrix Games and Stochastic Games
Optimistic Gradient Descent Ascent (OGDA) algorithm for saddlepoint opt...
read it

Bias no more: highprobability datadependent regret bounds for adversarial bandits and MDPs
We develop a new approach to obtaining high probability regret bounds fo...
read it

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
This work studies the problem of learning episodic Markov Decision Proce...
read it

A Modelfree Learning Algorithm for Infinitehorizon Averagereward MDPs with Nearoptimal Regret
Recently, modelfree reinforcement learning has attracted research atten...
read it

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds
We revisit the problem of online learning with sleeping experts/bandits:...
read it

Taking a hint: How to leverage loss predictors in contextual bandits?
We initiate the study of learning in contextual bandits with the help of...
read it

A Closer Look at Smallloss Bounds for Bandits with Graph Feedback
We study smallloss bounds for the adversarial multiarmed bandits probl...
read it

Fair Contextual MultiArmed Bandits: Theory and Experiments
When an AI system interacts with multiple users, it frequently needs to ...
read it

Learning Adversarial MDPs with Bandit Feedback and Unknown Transition
We consider the problem of learning in episodic finitehorizon Markov de...
read it

Modelfree Reinforcement Learning in Infinitehorizon Averagereward Markov Decision Processes
Modelfree reinforcement learning is known to be memory and computation ...
read it

Model selection for contextual bandits
We introduce the problem of model selection for contextual bandits, wher...
read it

Equipping Experts/Bandits with Longterm Memory
We propose the first reductionbased approach to obtaining longterm mem...
read it

Hypothesis Set Stability and Generalization
We present an extensive study of generalization for datadependent hypot...
read it

A New Algorithm for Nonstationary Contextual Bandits: Efficient, Optimal, and Parameterfree
We propose the first contextual bandit algorithm that is parameterfree,...
read it

Improved Pathlength Regret Bounds for Bandits
We study adaptive regret bounds in terms of the variation of the losses ...
read it

Beating Stochastic and Adversarial Semibandits Optimally and Simultaneously
We develop the first general semibandit algorithm that simultaneously a...
read it

Efficient Online Portfolio with Logarithmic Regret
We study the decadesold problem of online portfolio management and prop...
read it

Logistic Regression: The Importance of Being Improper
Learning linear predictors with the logistic lossboth in stochastic a...
read it

Practical Contextual Bandits with Regression Oracles
A major challenge in contextual bandits is to design generalpurpose alg...
read it

More Adaptive Algorithms for Adversarial Bandits
We develop a novel and generic algorithm for the adversarial multiarmed...
read it

Efficient Contextual Bandits in Nonstationary Worlds
Most contextual bandit algorithms minimize regret to the best fixed poli...
read it

Corralling a Band of Bandit Algorithms
We study the problem of combining multiple bandit algorithms (that is, o...
read it
Haipeng Luo
is this you? claim profile