Qinbo Bai

research

∙ 09/05/2023

Regret Analysis of Policy Gradient Algorithm for Infinite Horizon Average Reward Markov Decision Processes

In this paper, we consider an infinite horizon average reward Markov Dec...

0 Qinbo Bai, et al. ∙

research

∙ 06/12/2022

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual Algorithm

We consider the problem of constrained Markov decision process (CMDP) in...

0 Qinbo Bai, et al. ∙

research

∙ 09/13/2021

Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach

Reinforcement learning is widely used in applications where one needs to...

0 Qinbo Bai, et al. ∙

research

∙ 09/12/2021

Concave Utility Reinforcement Learning with Zero-Constraint Violations

We consider the problem of tabular infinite horizon concave utility rein...

0 Mridul Agarwal, et al. ∙

research

∙ 06/12/2021

Markov Decision Processes with Long-Term Average Constraints

We consider the problem of constrained Markov Decision Process (CMDP) wh...

0 Mridul Agarwal, et al. ∙

research

∙ 05/28/2021

Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

Many engineering problems have multiple objectives, and the overall aim ...

0 Qinbo Bai, et al. ∙

research

∙ 06/10/2020

Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

In the optimization of dynamical systems, the variables typically have c...

0 Qinbo Bai, et al. ∙

research

∙ 03/11/2020

Model-Free Algorithm and Regret Analysis for MDPs with Peak Constraints

In the optimization of dynamic systems, the variables typically have con...

0 Qinbo Bai, et al. ∙

research

∙ 10/03/2019

Escaping Saddle Points for Zeroth-order Nonconvex Optimization using Estimated Gradient Descent

Gradient descent and its variants are widely used in machine learning. H...

0 Qinbo Bai, et al. ∙

Qinbo Bai

Featured Co-authors

Sign in with Google

Consider DeepAI Pro