
A Unified OffPolicy Evaluation Approach for General Value Function
General Value Function (GVF) is a powerful tool to represent both the pr...
Doubly Robust OffPolicy ActorCritic: Convergence and Optimality
Designing offpolicy reinforcement learning algorithms is typically a ve...
Proximal Gradient DescentAscent: Variable Convergence under KŁ Geometry
The gradient descentascent (GDA) algorithm has been widely applied to s...
A Primal Approach to Constrained Policy Optimization: Global Optimality and FiniteTime Analysis
Safe reinforcement learning (SRL) problems are typically modeled as cons...
Sample Complexity Bounds for Two Timescale Valuebased Reinforcement Learning Algorithms
Two timescale stochastic approximation (SA) has been widely used in valu...
When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence
Generative adversarial imitation learning (GAIL) is a popular inverse re...
Enhanced First and Zeroth Order Variance Reduced Algorithms for MinMax Optimization
Minmax optimization captures many important machine learning problems s...
Nonasymptotic Convergence Analysis of Two Timescale (Natural) ActorCritic Algorithms
As an important type of reinforcement learning algorithms, actorcritic ...
Improving Sample Complexity Bounds for ActorCritic Algorithms
The actorcritic (AC) algorithm is a popular method to find an optimal p...
Nonasymptotic Convergence of Adamtype Reinforcement Learning Algorithms under Markovian Sampling
Despite the wide applications of Adam in reinforcement learning (RL), th...
Reanalysis of Variance Reduced Temporal Difference Learning
Temporal difference (TD) learning is a popular algorithm for policy eval...
Two Timescale OffPolicy TD Learning: Nonasymptotic Analysis over Markovian Samples
Gradientbased temporal difference (GTD) algorithms are widely used in o...
FiniteSample Analysis for SARSA and QLearning with Linear Function Approximation
Though the convergence of major reinforcement learning algorithms has be...
Convergence of SGD in Learning ReLU Models with Separable Data
We consider the binary classification problem in which the objective fun...
