
Permutation Invariant Policy Optimization for MeanField MultiAgent Reinforcement Learning: A Principled Approach
Multiagent reinforcement learning (MARL) becomes more challenging in th...
read it

Principled Exploration via Optimistic Bootstrapping and Backward Induction
One principled approach for provably efficient exploration is incorporat...
read it

Doubly Robust OffPolicy ActorCritic: Convergence and Optimality
Designing offpolicy reinforcement learning algorithms is typically a ve...
read it

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning
In offline reinforcement learning (RL) an optimal policy is learnt solel...
read it

A MomentumAssisted SingleTimescale Stochastic Approximation Algorithm for Bilevel Optimization
This paper proposes a new algorithm – the Momentumassisted Singletimes...
read it

Provably Training Neural Network Classifiers under Fairness Constraints
Training a classifier under fairness constraints has gotten increasing a...
read it

Is Pessimism Provably Efficient for Offline RL?
We study offline reinforcement learning (RL), which aims to learn an opt...
read it

RiskSensitive Deep RL: VarianceConstrained ActorCritic Provably Finds Globally Optimal Policy
While deep reinforcement learning has achieved tremendous successes in v...
read it

Variational Transport: A Convergent ParticleBasedAlgorithm for Distributional Optimization
We consider the optimization problem of minimizing a functional defined ...
read it

On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
The classical theory of reinforcement learning (RL) has focused on tabul...
read it

EndtoEnd Learning and Intervention in Games
In a social system, the selfinterest of agents can be detrimental to th...
read it

Variational Dynamic for SelfSupervised Exploration in Deep Reinforcement Learning
Efficient exploration remains a challenging problem in reinforcement lea...
read it

Provable Fictitious Play for General MeanField Games
We propose a reinforcement learning algorithm for stationary meanfield ...
read it

Nearly DimensionIndependent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
We consider the stochastic contextual bandit problem under the high dime...
read it

SingleTimescale Stochastic NonconvexConcave Optimization for Smooth Nonlinear TD Learning
TemporalDifference (TD) learning with nonlinear smooth function approxi...
read it

Global Convergence of Policy Gradient for LinearQuadratic MeanField Control/Game in Continuous Time
Reinforcement learning is a powerful tool to learn the optimal policy of...
read it

SingleTimescale ActorCritic Provably Finds Globally Optimal Policy
We study the global convergence and global optimality of actorcritic, o...
read it

A TwoTimescale Framework for Bilevel Optimization: Complexity Analysis and Application to ActorCritic
This paper analyzes a twotimescale stochastic algorithm for a class of ...
read it

Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion
Langevin diffusion is a powerful method for nonconvex optimization, whic...
read it

Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach
Structural equation models (SEMs) are widely used in sciences, ranging f...
read it

Dynamic Regret of Policy Optimization in Nonstationary Environments
We consider reinforcement learning (RL) in episodic MDPs with adversaria...
read it

On the Global Optimality of ModelAgnostic MetaLearning
Modelagnostic metalearning (MAML) formulates metalearning as a bileve...
read it

RiskSensitive Reinforcement Learning: NearOptimal RiskSample Tradeoff in Regret
We study risksensitive reinforcement learning in episodic Markov decisi...
read it

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data
Empowered by expressive function approximators such as neural networks, ...
read it

Breaking the Curse of Many Agents: Provable Mean Embedding QIteration for MeanField Reinforcement Learning
Multiagent reinforcement learning (MARL) achieves significant empirical...
read it

Neural Certificates for Safe Control Policies
This paper develops an approach to learn a policy of a dynamical system ...
read it

Can TemporalDifference and QLearning Learn Representation? A MeanField Theory
Temporaldifference and Qlearning play a key role in deep reinforcement...
read it

Deep Reinforcement Learning with Smooth Policy
Deep neural networks have been widely adopted in modern reinforcement le...
read it

Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate
Generative adversarial imitation learning (GAIL) demonstrates tremendous...
read it

Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees
Graph representation learning is a ubiquitous task in machine learning w...
read it

Upper Confidence PrimalDual Optimization: Stochastically Constrained Markov Decision Processes with Adversarial Losses and Unknown Transitions
We consider online learning for episodic Markov decision processes (MDPs...
read it

Provably Efficient Safe Exploration via PrimalDual Policy Optimization
We study the Safe Reinforcement Learning (SRL) problem using the Constra...
read it

Learning ZeroSum SimultaneousMove Markov Games Using Function Approximation and Correlated Equilibrium
We develop provably efficient reinforcement learning algorithms for two...
read it

On Computation and Generalization of Generative Adversarial Imitation Learning
Generative Adversarial Imitation Learning (GAIL) is a powerful and pract...
read it

Pontryagin Differentiable Programming: An EndtoEnd Learning and Control Framework
This paper develops a Pontryagin differentiable programming (PDP) method...
read it

Natural ActorCritic Converges Globally for Hierarchical Linear Quadratic Regulator
Multiagent reinforcement learning has been successfully applied to a nu...
read it

Provably Efficient Exploration in Policy Optimization
While policybased reinforcement learning (RL) achieves tremendous succe...
read it

Convergent Policy Optimization for Safe Reinforcement Learning
We study the safe reinforcement learning problem with nonlinear function...
read it

ActorCritic Provably Finds Nash Equilibria of LinearQuadratic MeanField Games
We study discretetime meanfield Markov games with infinite numbers of ...
read it

Credible Sample Elicitation by Deep Learning, for Deep Learning
It is important to collect credible training samples (x,y) for building ...
read it

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence
Policy gradient methods with actorcritic schemes demonstrate tremendous...
read it

Fast multiagent temporaldifference learning via homotopy stochastic primaldual optimization
We consider a distributed multiagent policy evaluation problem in reinf...
read it

More Supervision, Less Computation: StatisticalComputational Tradeoffs in Weakly Supervised Learning
We consider the weakly supervised binary classification problem where th...
read it

On the Global Convergence of ActorCritic: A Case for Linear Quadratic Regulator with Ergodic Cost
Despite the empirical success of the actorcritic algorithm, its theoret...
read it

Provably Efficient Reinforcement Learning with Linear Function Approximation
Modern Reinforcement Learning (RL) is commonly applied to practical prob...
read it

A CommunicationEfficient MultiAgent ActorCritic Algorithm for Distributed Reinforcement Learning
This paper considers a distributed reinforcement learning problem in whi...
read it

Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
Proximal policy optimization and trust region policy optimization (PPO a...
read it

Neural TemporalDifference Learning Converges to Global Optima
Temporaldifference learning (TD), coupled with neural networks, is amon...
read it

A MultiAgent OffPolicy ActorCritic Algorithm for Distributed Reinforcement Learning
This paper extends offpolicy reinforcement learning to the multiagent ...
read it

On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator
We study the global convergence of generative adversarial imitation lear...
read it
Zhaoran Wang
is this you? claim profile
Graduate student in the Department of Operations Research and Financial Engineering at Princeton University