
Achieving Near InstanceOptimality and MinimaxOptimality in Stochastic and Adversarial Linear Bandits Simultaneously
In this work, we develop linear bandit algorithms that automatically ada...
read it

Nonstationary Reinforcement Learning without Prior Knowledge: An Optimal Blackbox Approach
We propose a blackbox reduction that turns a certain reinforcement lear...
read it

Lastiterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinitehorizon Competitive Markov Games
We study infinitehorizon discounted twoplayer zerosum Markov games, a...
read it

Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications
We resolve the longstanding "impossible tuning" issue for the classic e...
read it

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition
We study the stochastic shortest path problem with adversarial costs and...
read it

Learning Infinitehorizon Averagereward MDPs with Linear Function Approximation
We develop several new algorithms for learning Markov Decision Processes...
read it

Linear Lastiterate Convergence for Matrix Games and Stochastic Games
Optimistic Gradient Descent Ascent (OGDA) algorithm for saddlepoint opt...
read it

Bias no more: highprobability datadependent regret bounds for adversarial bandits and MDPs
We develop a new approach to obtaining high probability regret bounds fo...
read it

A Modelfree Learning Algorithm for Infinitehorizon Averagereward MDPs with Nearoptimal Regret
Recently, modelfree reinforcement learning has attracted research atten...
read it

Federated Residual Learning
We study a new form of federated learning where the clients train person...
read it

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds
We revisit the problem of online learning with sleeping experts/bandits:...
read it

Taking a hint: How to leverage loss predictors in contextual bandits?
We initiate the study of learning in contextual bandits with the help of...
read it

Modelfree Reinforcement Learning in Infinitehorizon Averagereward Markov Decision Processes
Modelfree reinforcement learning is known to be memory and computation ...
read it

Analyzing the Variance of Policy Gradient Estimators for the LinearQuadratic Regulator
We study the variance of the REINFORCE policy gradient estimator in envi...
read it

Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case
We study the problem of efficient online multiclass linear classificatio...
read it

A New Algorithm for Nonstationary Contextual Bandits: Efficient, Optimal, and Parameterfree
We propose the first contextual bandit algorithm that is parameterfree,...
read it

Improved Pathlength Regret Bounds for Bandits
We study adaptive regret bounds in terms of the variation of the losses ...
read it

Beating Stochastic and Adversarial Semibandits Optimally and Simultaneously
We develop the first general semibandit algorithm that simultaneously a...
read it

Efficient Online Portfolio with Logarithmic Regret
We study the decadesold problem of online portfolio management and prop...
read it

More Adaptive Algorithms for Adversarial Bandits
We develop a novel and generic algorithm for the adversarial multiarmed...
read it
ChenYu Wei
is this you? claim profile