
Achieving Near InstanceOptimality and MinimaxOptimality in Stochastic and Adversarial Linear Bandits Simultaneously
In this work, we develop linear bandit algorithms that automatically ada...
Nonstationary Reinforcement Learning without Prior Knowledge: An Optimal Blackbox Approach
We propose a blackbox reduction that turns a certain reinforcement lear...
Lastiterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinitehorizon Competitive Markov Games
We study infinitehorizon discounted twoplayer zerosum Markov games, a...
Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications
We resolve the longstanding "impossible tuning" issue for the classic e...
Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
We study the stochastic shortest path problem with adversarial costs and...
Learning Infinitehorizon Averagereward MDPs with Linear Function Approximation
We develop several new algorithms for learning Markov Decision Processes...
Linear Lastiterate Convergence for Matrix Games and Stochastic Games
Optimistic Gradient Descent Ascent (OGDA) algorithm for saddlepoint opt...
Bias no more: highprobability datadependent regret bounds for adversarial bandits and MDPs
We develop a new approach to obtaining high probability regret bounds fo...
A Modelfree Learning Algorithm for Infinitehorizon Averagereward MDPs with Nearoptimal Regret
Recently, modelfree reinforcement learning has attracted research atten...
Federated Residual Learning
We study a new form of federated learning where the clients train person...
Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds
We revisit the problem of online learning with sleeping experts/bandits:...
Taking a hint: How to leverage loss predictors in contextual bandits?
We initiate the study of learning in contextual bandits with the help of...
Modelfree Reinforcement Learning in Infinitehorizon Averagereward Markov Decision Processes
Modelfree reinforcement learning is known to be memory and computation ...
Analyzing the Variance of Policy Gradient Estimators for the LinearQuadratic Regulator
We study the variance of the REINFORCE policy gradient estimator in envi...
Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case
We study the problem of efficient online multiclass linear classificatio...
A New Algorithm for Nonstationary Contextual Bandits: Efficient, Optimal, and Parameterfree
We propose the first contextual bandit algorithm that is parameterfree,...
Improved Pathlength Regret Bounds for Bandits
We study adaptive regret bounds in terms of the variation of the losses ...
Beating Stochastic and Adversarial Semibandits Optimally and Simultaneously
We develop the first general semibandit algorithm that simultaneously a...
Efficient Online Portfolio with Logarithmic Regret
We study the decadesold problem of online portfolio management and prop...
More Adaptive Algorithms for Adversarial Bandits
We develop a novel and generic algorithm for the adversarial multiarmed...
ChenYu Wei
