
Doubly Robust OffPolicy ActorCritic: Convergence and Optimality
Designing offpolicy reinforcement learning algorithms is typically a ve...
read it

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
In offline reinforcement learning (RL) an optimal policy is learnt solel...
read it

A MomentumAssisted SingleTimescale Stochastic Approximation Algorithm for Bilevel Optimization
This paper proposes a new algorithm – the Momentumassisted Singletimes...
read it

Is Pessimism Provably Efficient for Offline RL?
We study offline reinforcement learning (RL), which aims to learn an opt...
read it

RiskSensitive Deep RL: VarianceConstrained ActorCritic Provably Finds Globally Optimal Policy
While deep reinforcement learning has achieved tremendous successes in v...
read it

Variational Transport: A Convergent ParticleBasedAlgorithm for Distributional Optimization
We consider the optimization problem of minimizing a functional defined ...
read it

On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
The classical theory of reinforcement learning (RL) has focused on tabul...
read it

Provable Fictitious Play for General MeanField Games
We propose a reinforcement learning algorithm for stationary meanfield ...
read it

SingleTimescale Stochastic NonconvexConcave Optimization for Smooth Nonlinear TD Learning
TemporalDifference (TD) learning with nonlinear smooth function approxi...
read it

Global Convergence of Policy Gradient for LinearQuadratic MeanField Control/Game in Continuous Time
Reinforcement learning is a powerful tool to learn the optimal policy of...
read it

SingleTimescale ActorCritic Provably Finds Globally Optimal Policy
We study the global convergence and global optimality of actorcritic, o...
read it

Understanding Implicit Regularization in OverParameterized Nonlinear Statistical Model
We study the implicit regularization phenomenon induced by simple optimi...
read it

A TwoTimescale Framework for Bilevel Optimization: Complexity Analysis and Application to ActorCritic
This paper analyzes a twotimescale stochastic algorithm for a class of ...
read it

Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach
Structural equation models (SEMs) are widely used in sciences, ranging f...
read it

Dynamic Regret of Policy Optimization in Nonstationary Environments
We consider reinforcement learning (RL) in episodic MDPs with adversaria...
read it

On the Global Optimality of ModelAgnostic MetaLearning
Modelagnostic metalearning (MAML) formulates metalearning as a bileve...
read it

RiskSensitive Reinforcement Learning: NearOptimal RiskSample Tradeoff in Regret
We study risksensitive reinforcement learning in episodic Markov decisi...
read it

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data
Empowered by expressive function approximators such as neural networks, ...
read it

Breaking the Curse of Many Agents: Provable Mean Embedding QIteration for MeanField Reinforcement Learning
Multiagent reinforcement learning (MARL) achieves significant empirical...
read it

Neural Certificates for Safe Control Policies
This paper develops an approach to learn a policy of a dynamical system ...
read it

Can TemporalDifference and QLearning Learn Representation? A MeanField Theory
Temporaldifference and Qlearning play a key role in deep reinforcement...
read it

An efficient Gehantype estimation for the accelerated failure time model with clustered and censored data
In medical studies, the collected covariates usually contain underlying ...
read it

Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate
Generative adversarial imitation learning (GAIL) demonstrates tremendous...
read it

Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees
Graph representation learning is a ubiquitous task in machine learning w...
read it

Upper Confidence PrimalDual Optimization: Stochastically Constrained Markov Decision Processes with Adversarial Losses and Unknown Transitions
We consider online learning for episodic Markov decision processes (MDPs...
read it

Provably Efficient Safe Exploration via PrimalDual Policy Optimization
We study the Safe Reinforcement Learning (SRL) problem using the Constra...
read it

Learning ZeroSum SimultaneousMove Markov Games Using Function Approximation and Correlated Equilibrium
We develop provably efficient reinforcement learning algorithms for two...
read it

On Computation and Generalization of Generative Adversarial Imitation Learning
Generative Adversarial Imitation Learning (GAIL) is a powerful and pract...
read it

Pontryagin Differentiable Programming: An EndtoEnd Learning and Control Framework
This paper develops a Pontryagin differentiable programming (PDP) method...
read it

Natural ActorCritic Converges Globally for Hierarchical Linear Quadratic Regulator
Multiagent reinforcement learning has been successfully applied to a nu...
read it

Provably Efficient Exploration in Policy Optimization
While policybased reinforcement learning (RL) achieves tremendous succe...
read it

Decentralized MultiAgent Reinforcement Learning with Networked Agents: Recent Advances
Multiagent reinforcement learning (MARL) has long been a significant an...
read it

MultiAgent Reinforcement Learning: A Selective Overview of Theories and Algorithms
Recent years have witnessed significant advances in reinforcement learni...
read it

Convergent Policy Optimization for Safe Reinforcement Learning
We study the safe reinforcement learning problem with nonlinear function...
read it

ActorCritic Provably Finds Nash Equilibria of LinearQuadratic MeanField Games
We study discretetime meanfield Markov games with infinite numbers of ...
read it

Credible Sample Elicitation by Deep Learning, for Deep Learning
It is important to collect credible training samples (x,y) for building ...
read it

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence
Policy gradient methods with actorcritic schemes demonstrate tremendous...
read it

Robust OneBit Recovery via ReLU Generative Networks: Improved Statistical Rates and Global Landscape Analysis
We study the robust onebit compressed sensing problem whose goal is to ...
read it

Fast multiagent temporaldifference learning via homotopy stochastic primaldual optimization
We consider a distributed multiagent policy evaluation problem in reinf...
read it

More Supervision, Less Computation: StatisticalComputational Tradeoffs in Weakly Supervised Learning
We consider the weakly supervised binary classification problem where th...
read it

On the Global Convergence of ActorCritic: A Case for Linear Quadratic Regulator with Ergodic Cost
Despite the empirical success of the actorcritic algorithm, its theoret...
read it

Stochastic Convergence Results for Regularized ActorCritic Methods
In this paper, we present a stochastic convergence proof, under suitable...
read it

Provably Efficient Reinforcement Learning with Linear Function Approximation
Modern Reinforcement Learning (RL) is commonly applied to practical prob...
read it

A CommunicationEfficient MultiAgent ActorCritic Algorithm for Distributed Reinforcement Learning
This paper considers a distributed reinforcement learning problem in whi...
read it

Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
Proximal policy optimization and trust region policy optimization (PPO a...
read it

Policy Optimization Provably Converges to Nash Equilibria in ZeroSum Linear Quadratic Games
We study the global convergence of policy optimization for finding the N...
read it

Neural TemporalDifference Learning Converges to Global Optima
Temporaldifference learning (TD), coupled with neural networks, is amon...
read it

A MultiAgent OffPolicy ActorCritic Algorithm for Distributed Reinforcement Learning
This paper extends offpolicy reinforcement learning to the multiagent ...
read it

FiniteSample Analyses for Fully Decentralized MultiAgent Reinforcement Learning
Despite the increasing interest in multiagent reinforcement learning (M...
read it

Provable Gaussian Embedding with One Observation
The success of machine learning methods heavily relies on having an appr...
read it