Chen-Yu Wei

research

∙ 09/02/2023

Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits

We consider the adversarial linear contextual bandit problem, where the ...

0 Haolin Liu, et al. ∙

research

∙ 06/20/2023

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs

We study the problem of computing an optimal policy of an infinite-horiz...

0 Dongsheng Ding, et al. ∙

research

∙ 05/27/2023

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Existing online learning algorithms for adversarial Markov Decision Proc...

0 Tiancheng Jin, et al. ∙

research

∙ 05/01/2023

First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

We consider the adversarial linear contextual bandit setting, which allo...

0 Julia Olkhovskaya, et al. ∙

research

∙ 03/05/2023

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games

We revisit the problem of learning in two-player zero-sum Markov games, ...

0 Yang Cai, et al. ∙

research

∙ 02/20/2023

A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

Best-of-both-worlds algorithms for online learning which achieve near-op...

0 Christoph Dann, et al. ∙

research

∙ 02/18/2023

Best of Both Worlds Policy Optimization

Policy optimization methods are popular reinforcement learning algorithm...

0 Christoph Dann, et al. ∙

research

∙ 10/17/2022

A Unified Algorithm for Stochastic Path Problems

We study reinforcement learning in stochastic path (SP) problems. The go...

0 Christoph Dann, et al. ∙

research

∙ 02/10/2022

Personalization Improves Privacy-Accuracy Tradeoffs in Federated Optimization

Large-scale machine learning systems often involve data distributed acro...

11 Alberto Bietti, et al. ∙

research

∙ 02/08/2022

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

We examine global non-asymptotic convergence properties of policy gradie...

0 Dongsheng Ding, et al. ∙

research

∙ 11/01/2021

Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure

Multi-agent reinforcement learning (MARL) problems are challenging due t...

0 Hsu Kao, et al. ∙

research

∙ 10/07/2021

A Model Selection Approach for Corruption Robust Reinforcement Learning

We develop a model selection approach to tackle reinforcement learning w...

0 Chen-Yu Wei, et al. ∙

research

∙ 07/18/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Policy optimization is a widely-used method in reinforcement learning. D...

0 Haipeng Luo, et al. ∙

research

∙ 02/11/2021

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

In this work, we develop linear bandit algorithms that automatically ada...

0 Chung-Wei Lee, et al. ∙

research

∙ 02/10/2021

Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach

We propose a black-box reduction that turns a certain reinforcement lear...

0 Chen-Yu Wei, et al. ∙

research

∙ 02/08/2021

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

We study infinite-horizon discounted two-player zero-sum Markov games, a...

0 Chen-Yu Wei, et al. ∙

research

∙ 02/01/2021

Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications

We resolve the long-standing "impossible tuning" issue for the classic e...

0 Liyu Chen, et al. ∙

research

∙ 12/07/2020

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

We study the stochastic shortest path problem with adversarial costs and...

0 Liyu Chen, et al. ∙

research

∙ 07/23/2020

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

We develop several new algorithms for learning Markov Decision Processes...

12 Chen-Yu Wei, et al. ∙

research

∙ 06/16/2020

Linear Last-iterate Convergence for Matrix Games and Stochastic Games

Optimistic Gradient Descent Ascent (OGDA) algorithm for saddle-point opt...

0 Chung-Wei Lee, et al. ∙

research

∙ 06/14/2020

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

We develop a new approach to obtaining high probability regret bounds fo...

0 Chung-Wei Lee, et al. ∙

research

∙ 06/08/2020

A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret

Recently, model-free reinforcement learning has attracted research atten...

12 Mehdi Jafarnia-Jahromi, et al. ∙

research

∙ 03/28/2020

Federated Residual Learning

We study a new form of federated learning where the clients train person...

5 Alekh Agarwal, et al. ∙

research

∙ 03/07/2020

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds

We revisit the problem of online learning with sleeping experts/bandits:...

0 Ehsan Emamjomeh-Zadeh, et al. ∙

research

∙ 03/04/2020

Taking a hint: How to leverage loss predictors in contextual bandits?

We initiate the study of learning in contextual bandits with the help of...

0 Chen-Yu Wei, et al. ∙

research

∙ 10/15/2019

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Model-free reinforcement learning is known to be memory and computation ...

0 Chen-Yu Wei, et al. ∙

research

∙ 10/02/2019

Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator

We study the variance of the REINFORCE policy gradient estimator in envi...

0 James A. Preiss, et al. ∙

research

∙ 02/06/2019

Bandit Multiclass Linear Classification: Efficient Algorithms for the Separable Case

We study the problem of efficient online multiclass linear classificatio...

0 Alina Beygelzimer, et al. ∙

research

∙ 02/03/2019

A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free

We propose the first contextual bandit algorithm that is parameter-free,...

0 Yifang Chen, et al. ∙

research

∙ 01/29/2019

Improved Path-length Regret Bounds for Bandits

We study adaptive regret bounds in terms of the variation of the losses ...

8 Sébastien Bubeck, et al. ∙

research

∙ 01/25/2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

We develop the first general semi-bandit algorithm that simultaneously a...

0 Julian Zimmert, et al. ∙

research

∙ 05/18/2018

Efficient Online Portfolio with Logarithmic Regret

We study the decades-old problem of online portfolio management and prop...

0 Haipeng Luo, et al. ∙

research

∙ 01/10/2018

More Adaptive Algorithms for Adversarial Bandits

We develop a novel and generic algorithm for the adversarial multi-armed...

0 Chen-Yu Wei, et al. ∙

Chen-Yu Wei

Featured Co-authors

Sign in with Google

Consider DeepAI Pro