
Variational Dynamic for SelfSupervised Exploration in Deep Reinforcement Learning
Efficient exploration remains a challenging problem in reinforcement lea...
read it

Provable Fictitious Play for General MeanField Games
We propose a reinforcement learning algorithm for stationary meanfield ...
read it

Nearly DimensionIndependent Sparse Linear Bandit over Small Action Spaces via Best Subset Selection
We consider the stochastic contextual bandit problem under the high dime...
read it

SingleTimescale Stochastic NonconvexConcave Optimization for Smooth Nonlinear TD Learning
TemporalDifference (TD) learning with nonlinear smooth function approxi...
read it

Global Convergence of Policy Gradient for LinearQuadratic MeanField Control/Game in Continuous Time
Reinforcement learning is a powerful tool to learn the optimal policy of...
read it

SingleTimescale ActorCritic Provably Finds Globally Optimal Policy
We study the global convergence and global optimality of actorcritic, o...
read it

A TwoTimescale Framework for Bilevel Optimization: Complexity Analysis and Application to ActorCritic
This paper analyzes a twotimescale stochastic algorithm for a class of ...
read it

Accelerating Nonconvex Learning via Replica Exchange Langevin Diffusion
Langevin diffusion is a powerful method for nonconvex optimization, whic...
read it

Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach
Structural equation models (SEMs) are widely used in sciences, ranging f...
read it

Dynamic Regret of Policy Optimization in Nonstationary Environments
We consider reinforcement learning (RL) in episodic MDPs with adversaria...
read it

On the Global Optimality of ModelAgnostic MetaLearning
Modelagnostic metalearning (MAML) formulates metalearning as a bileve...
read it

RiskSensitive Reinforcement Learning: NearOptimal RiskSample Tradeoff in Regret
We study risksensitive reinforcement learning in episodic Markov decisi...
read it

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data
Empowered by expressive function approximators such as neural networks, ...
read it

Breaking the Curse of Many Agents: Provable Mean Embedding QIteration for MeanField Reinforcement Learning
Multiagent reinforcement learning (MARL) achieves significant empirical...
read it

Neural Certificates for Safe Control Policies
This paper develops an approach to learn a policy of a dynamical system ...
read it

Can TemporalDifference and QLearning Learn Representation? A MeanField Theory
Temporaldifference and Qlearning play a key role in deep reinforcement...
read it

Deep Reinforcement Learning with Smooth Policy
Deep neural networks have been widely adopted in modern reinforcement le...
read it

Generative Adversarial Imitation Learning with Neural Networks: Global Optimality and Convergence Rate
Generative adversarial imitation learning (GAIL) demonstrates tremendous...
read it

Semiparametric Nonlinear Bipartite Graph Representation Learning with Provable Guarantees
Graph representation learning is a ubiquitous task in machine learning w...
read it

Upper Confidence PrimalDual Optimization: Stochastically Constrained Markov Decision Processes with Adversarial Losses and Unknown Transitions
We consider online learning for episodic Markov decision processes (MDPs...
read it

Provably Efficient Safe Exploration via PrimalDual Policy Optimization
We study the Safe Reinforcement Learning (SRL) problem using the Constra...
read it

Learning ZeroSum SimultaneousMove Markov Games Using Function Approximation and Correlated Equilibrium
We develop provably efficient reinforcement learning algorithms for two...
read it

On Computation and Generalization of Generative Adversarial Imitation Learning
Generative Adversarial Imitation Learning (GAIL) is a powerful and pract...
read it

Pontryagin Differentiable Programming: An EndtoEnd Learning and Control Framework
This paper develops a Pontryagin differentiable programming (PDP) method...
read it

Natural ActorCritic Converges Globally for Hierarchical Linear Quadratic Regulator
Multiagent reinforcement learning has been successfully applied to a nu...
read it

Provably Efficient Exploration in Policy Optimization
While policybased reinforcement learning (RL) achieves tremendous succe...
read it

Convergent Policy Optimization for Safe Reinforcement Learning
We study the safe reinforcement learning problem with nonlinear function...
read it

ActorCritic Provably Finds Nash Equilibria of LinearQuadratic MeanField Games
We study discretetime meanfield Markov games with infinite numbers of ...
read it

Credible Sample Elicitation by Deep Learning, for Deep Learning
It is important to collect credible training samples (x,y) for building ...
read it

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence
Policy gradient methods with actorcritic schemes demonstrate tremendous...
read it

Fast multiagent temporaldifference learning via homotopy stochastic primaldual optimization
We consider a distributed multiagent policy evaluation problem in reinf...
read it

More Supervision, Less Computation: StatisticalComputational Tradeoffs in Weakly Supervised Learning
We consider the weakly supervised binary classification problem where th...
read it

On the Global Convergence of ActorCritic: A Case for Linear Quadratic Regulator with Ergodic Cost
Despite the empirical success of the actorcritic algorithm, its theoret...
read it

Provably Efficient Reinforcement Learning with Linear Function Approximation
Modern Reinforcement Learning (RL) is commonly applied to practical prob...
read it

A CommunicationEfficient MultiAgent ActorCritic Algorithm for Distributed Reinforcement Learning
This paper considers a distributed reinforcement learning problem in whi...
read it

Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
Proximal policy optimization and trust region policy optimization (PPO a...
read it

Neural TemporalDifference Learning Converges to Global Optima
Temporaldifference learning (TD), coupled with neural networks, is amon...
read it

A MultiAgent OffPolicy ActorCritic Algorithm for Distributed Reinforcement Learning
This paper extends offpolicy reinforcement learning to the multiagent ...
read it

On the Global Convergence of Imitation Learning: A Case for Linear Quadratic Regulator
We study the global convergence of generative adversarial imitation lear...
read it

A Theoretical Analysis of Deep QLearning
Despite the great empirical success of deep reinforcement learning, its ...
read it

Provable Gaussian Embedding with One Observation
The success of machine learning methods heavily relies on having an appr...
read it

Highdimensional Varying Index Coefficient Models via Stein's Identity
We study the parameter estimation problem for a singleindex varying coe...
read it

A convex formulation for highdimensional sparse sliced inverse regression
Sliced inverse regression is a popular tool for sufficient dimension red...
read it

Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes
Solving statistical learning problems often involves nonconvex optimizat...
read it

Curse of Heterogeneity: Computational Barriers in Sparse Mixture Models and Phase Retrieval
We study the fundamental tradeoffs between statistical accuracy and comp...
read it

OffPolicy Evaluation and Learning from Logged Bandit Feedback: Error Reduction via Surrogate Policy
When learning from a batch of logged bandit feedback, the discrepancy be...
read it

On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond
Our paper proposes a generalization error bound for a general family of ...
read it

MultiAgent Reinforcement Learning via Double Averaging PrimalDual Optimization
Despite the success of singleagent reinforcement learning, multiagent ...
read it

Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models
We propose a nonparametric method for detecting nonlinear causal relatio...
read it

Recovery of simultaneous low rank and twoway sparse coefficient matrices, a nonconvex approach
We study the problem of recovery of matrices that are simultaneously low...
read it
Zhaoran Wang
is this you? claim profile
Graduate student in the Department of Operations Research and Financial Engineering at Princeton University