Haipeng Luo

research

∙ 08/18/2023

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Large language models (LLMs), such as GPT-4, have shown remarkable perfo...

0 Haipeng Luo, et al. ∙

research

∙ 05/27/2023

No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Existing online learning algorithms for adversarial Markov Decision Proc...

0 Tiancheng Jin, et al. ∙

research

∙ 05/24/2023

Regret Matching+: (In)Stability and Fast Convergence in Games

Regret Matching+ (RM+) and its variants are important algorithms for sol...

0 Gabriele Farina, et al. ∙

research

∙ 03/05/2023

Uncoupled and Convergent Learning in Two-Player Zero-Sum Markov Games

We revisit the problem of learning in two-player zero-sum Markov games, ...

0 Yang Cai, et al. ∙

research

∙ 02/27/2023

Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms

We study the problem of designing adaptive multi-armed bandit algorithms...

0 Tiancheng Jin, et al. ∙

research

∙ 02/02/2023

Average-Constrained Policy Optimization

Reinforcement Learning (RL) with constraints is becoming an increasingly...

0 Akhil Agnihotri, et al. ∙

research

∙ 10/23/2022

No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution

Supply chain management (SCM) has been recognized as an important discip...

0 Mengxiao Zhang, et al. ∙

research

∙ 10/04/2022

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

We study high-probability regret bounds for adversarial K-armed bandits ...

0 Haipeng Luo, et al. ∙

research

∙ 08/31/2022

Clairvoyant Regret Minimization: Equivalence with Nemirovski's Conceptual Prox Method and Extension to General Convex Games

A recent paper by Piliouras et al. [2021, 2022] introduces an uncoupled ...

0 Gabriele Farina, et al. ∙

research

∙ 06/17/2022

Near-Optimal No-Regret Learning for General Convex Games

A recent line of work has established uncoupled learning dynamics such t...

0 Gabriele Farina, et al. ∙

research

∙ 05/26/2022

Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback

We consider regret minimization for Adversarial Markov Decision Processe...

0 Yan Dai, et al. ∙

research

∙ 05/25/2022

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

We initiate the study of dynamic regret minimization for goal-oriented r...

0 Liyu Chen, et al. ∙

research

∙ 04/25/2022

Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games

In this paper we establish efficient and uncoupled learning dynamics so ...

0 Ioannis Anagnostides, et al. ∙

research

∙ 02/12/2022

Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits

We consider the problem of combining and learning over a set of adversar...

0 Haipeng Luo, et al. ∙

research

∙ 02/12/2022

Adaptive Bandit Convex Optimization with Heterogeneous Curvature

We consider the problem of adversarial bandit convex optimization, that ...

0 Haipeng Luo, et al. ∙

research

∙ 02/07/2022

Policy Optimization for Stochastic Shortest Path

Policy optimization is among the most popular and successful reinforceme...

0 Liyu Chen, et al. ∙

research

∙ 02/01/2022

Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games

While extensive-form games (EFGs) can be converted into normal-form game...

0 Gabriele Farina, et al. ∙

research

∙ 01/31/2022

Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

We study regret minimization for infinite-horizon average-reward Markov ...

0 Liyu Chen, et al. ∙

research

∙ 01/31/2022

Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback

The standard assumption in reinforcement learning (RL) is that agents ob...

0 Tiancheng Jin, et al. ∙

research

∙ 01/30/2022

No-Regret Learning in Time-Varying Zero-Sum Games

Learning from repeated play in a fixed two-player zero-sum game is a cla...

8 Mengxiao Zhang, et al. ∙

research

∙ 12/18/2021

Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

We introduce two new no-regret algorithms for the stochastic shortest pa...

0 Liyu Chen, et al. ∙

research

∙ 07/18/2021

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

Policy optimization is a widely-used method in reinforcement learning. D...

0 Haipeng Luo, et al. ∙

research

∙ 06/27/2021

Last-iterate Convergence in Extensive-Form Games

Regret-based algorithms are highly efficient at finding approximate Nash...

0 Chung-Wei Lee, et al. ∙

research

∙ 06/15/2021

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

We introduce a generic template for developing regret minimization algor...

0 Liyu Chen, et al. ∙

research

∙ 06/09/2021

Online Learning for Stochastic Shortest Path Model via Posterior Sampling

We consider the problem of online reinforcement learning for the Stochas...

0 Mehdi Jafarnia-Jahromi, et al. ∙

research

∙ 06/08/2021

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

We consider the best-of-both-worlds problem for learning an episodic Mar...

0 Tiancheng Jin, et al. ∙

research

∙ 02/11/2021

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

In this work, we develop linear bandit algorithms that automatically ada...

0 Chung-Wei Lee, et al. ∙

research

∙ 02/10/2021

Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach

We propose a black-box reduction that turns a certain reinforcement lear...

0 Chen-Yu Wei, et al. ∙

research

∙ 02/10/2021

Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case

We make significant progress toward the stochastic shortest path problem...

0 Liyu Chen, et al. ∙

research

∙ 02/08/2021

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

We study infinite-horizon discounted two-player zero-sum Markov games, a...

0 Chen-Yu Wei, et al. ∙

research

∙ 02/01/2021

Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications

We resolve the long-standing "impossible tuning" issue for the classic e...

0 Liyu Chen, et al. ∙

research

∙ 12/07/2020

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

We study the stochastic shortest path problem with adversarial costs and...

0 Liyu Chen, et al. ∙

research

∙ 07/23/2020

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

We develop several new algorithms for learning Markov Decision Processes...

12 Chen-Yu Wei, et al. ∙

research

∙ 07/16/2020

Comparator-adaptive Convex Bandits

We study bandit convex optimization methods that adapt to the norm of th...

8 Dirk van der Hoeven, et al. ∙

research

∙ 06/25/2020

Active Online Domain Adaptation

Online machine learning systems need to adapt to domain shifts. Meanwhil...

9 Yining Chen, et al. ∙

research

∙ 06/19/2020

Open Problem: Model Selection for Contextual Bandits

In statistical learning, algorithms for model selection allow the learne...

0 Dylan J. Foster, et al. ∙

research

∙ 06/16/2020

Linear Last-iterate Convergence for Matrix Games and Stochastic Games

Optimistic Gradient Descent Ascent (OGDA) algorithm for saddle-point opt...

0 Chung-Wei Lee, et al. ∙

research

∙ 06/14/2020

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

We develop a new approach to obtaining high probability regret bounds fo...

0 Chung-Wei Lee, et al. ∙

research

∙ 06/10/2020

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

This work studies the problem of learning episodic Markov Decision Proce...

0 Tiancheng Jin, et al. ∙

research

∙ 06/08/2020

A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret

Recently, model-free reinforcement learning has attracted research atten...

12 Mehdi Jafarnia-Jahromi, et al. ∙

research

∙ 03/07/2020

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds

We revisit the problem of online learning with sleeping experts/bandits:...

0 Ehsan Emamjomeh-Zadeh, et al. ∙

research

∙ 03/04/2020

Taking a hint: How to leverage loss predictors in contextual bandits?

We initiate the study of learning in contextual bandits with the help of...

0 Chen-Yu Wei, et al. ∙

research

∙ 02/02/2020

A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

We study small-loss bounds for the adversarial multi-armed bandits probl...

0 Chung-Wei Lee, et al. ∙

research

∙ 12/13/2019

Fair Contextual Multi-Armed Bandits: Theory and Experiments

When an AI system interacts with multiple users, it frequently needs to ...

0 Yifang Chen, et al. ∙

research

∙ 12/03/2019

Learning Adversarial MDPs with Bandit Feedback and Unknown Transition

We consider the problem of learning in episodic finite-horizon Markov de...

0 Tiancheng Jin, et al. ∙

research

∙ 10/15/2019

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

Model-free reinforcement learning is known to be memory and computation ...

0 Chen-Yu Wei, et al. ∙

research

∙ 06/03/2019

Model selection for contextual bandits

We introduce the problem of model selection for contextual bandits, wher...

0 Dylan J. Foster, et al. ∙

research

∙ 05/30/2019

Equipping Experts/Bandits with Long-term Memory

We propose the first reduction-based approach to obtaining long-term mem...

0 Kai Zheng, et al. ∙

research

∙ 04/09/2019

Hypothesis Set Stability and Generalization

We present an extensive study of generalization for data-dependent hypot...

0 Dylan J. Foster, et al. ∙

research

∙ 02/03/2019

A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free

We propose the first contextual bandit algorithm that is parameter-free,...

0 Yifang Chen, et al. ∙

Haipeng Luo

Featured Co-authors

Sign in with Google

Consider DeepAI Pro