
Beyond Lazy Training for Overparameterized Tensor Decomposition
Overparametrization is an important technique in training neural networ...
read it

How Important is the TrainValidation Split in MetaLearning?
Metalearning aims to perform fast adaptation on a new task through lear...
read it

SanityChecking Pruning Methods: Random Tickets can Win the Jackpot
Network pruning is a method for reducing testtime computational resourc...
read it

Generalized Leverage Score Sampling for Neural Networks
Leverage score sampling is a powerful technique that originates from the...
read it

Predicting What You Already Know Helps: Provable SelfSupervised Learning
Selfsupervised representation learning solves auxiliary prediction task...
read it

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy
We provide a detailed asymptotic study of gradient flow trajectories and...
read it

Towards Understanding Hierarchical Learning: Benefits of Neural Representations
Deep neural networks can empirically perform efficient hierarchical lear...
read it

Convergence of MetaLearning with TaskSpecific Adaptation over Partial Parameters
Although modelagnostic metalearning (MAML) is a very successful algori...
read it

Shape Matters: Understanding the Implicit Bias of the Noise Covariance
The noise in stochastic gradient descent (SGD) provides a crucial implic...
read it

Distributed Estimation for Principal Component Analysis: a Gapfree Approach
The growing size of modern data sets brings many challenges to the exist...
read it

Steepest Descent Neural Architecture Optimization: Escaping Local Optimum with Signed Neural Splitting
We propose signed splitting steepest descent (S3D), which progressively ...
read it

FewShot Learning via Learning the Representation, Provably
This paper studies fewshot learning via representation learning, where ...
read it

Kernel and Rich Regimes in Overparametrized Models
A recent line of work studies overparametrized neural networks in the "k...
read it

Agnostic Qlearning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity
The current paper studies the problem of agnostic Qlearning with functi...
read it

When Does NonOrthogonal Tensor Decomposition Have No Spurious Local Minima?
We study the optimization problem for decomposing d dimensional fourtho...
read it

SGD Learns OneLayer Networks in WGANs
Generative adversarial networks (GANs) are a widely used framework for l...
read it

Beyond Linearization: On Quadratic and HigherOrder Approximation of Wide Neural Networks
Recent theoretical work has established connections between overparamet...
read it

Optimal transport mapping via input convex neural networks
In this paper, we present a novel and principled approach to learn the o...
read it

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes
Policy gradient methods are among the most effective methods in challeng...
read it

Incremental Methods for Weakly Convex Optimization
We consider incremental algorithms for solving weakly convex optimizatio...
read it

Convergence of Adversarial Training in Overparametrized Networks
Neural networks are vulnerable to adversarial examples, i.e. inputs that...
read it

Neural TemporalDifference Learning Converges to Global Optima
Temporaldifference learning (TD), coupled with neural networks, is amon...
read it

Lexicographic and DepthSensitive Margins in Homogeneous and NonHomogeneous Deep Models
With an eye toward understanding complexity control in deep learning, we...
read it

Solving NonConvex NonConcave MinMax Games Under PolyakŁojasiewicz Condition
In this short note, we consider the problem of solving a minmax zerosu...
read it

Gradient Descent Finds Global Minima of Deep Neural Networks
Gradient descent finds a global minimum in training deep neural networks...
read it

On the Margin Theory of Feedforward Neural Networks
Past works have shown that, somewhat surprisingly, overparametrization ...
read it

Provably Correct Automatic Subdifferentiation for Qualified Programs
The Cheap Gradient Principle (Griewank 2008)  the computational cost ...
read it

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced
We study the implicit regularization imposed by gradient descent for lea...
read it

Adding One Neuron Can Eliminate All Bad Local Minima
One of the main difficulties in analyzing neural networks is the noncon...
read it

Stochastic subgradient method converges on tame functions
This work considers the question: what convergence guarantees does the s...
read it

On the Power of Overparametrization in Neural Networks with Quadratic Activation
We provide new theoretical insights on why overparametrization is effec...
read it

Gradient PrimalDual Algorithm Converges to SecondOrder Stationary Solutions for Nonconvex Distributed Optimization
In this work, we study two firstorder primaldual based algorithms, the...
read it

Solving Approximate Wasserstein GANs to Stationarity
Generative Adversarial Networks (GANs) are one of the most practical str...
read it

Gradient Descent Learns Onehiddenlayer CNN: Don't be Afraid of Spurious Local Minima
We consider the problem of learning a onehiddenlayer neural network wi...
read it

Learning Onehiddenlayer Neural Networks with Landscape Design
We consider the problem of learning a onehiddenlayer neural network: w...
read it

Firstorder Methods Almost Always Avoid Saddle Points
We establish that firstorder methods avoid saddle points for almost all...
read it

When is a Convolutional Filter Easy To Learn?
We analyze the convergence of (stochastic) gradient descent algorithm fo...
read it

An inexact subsampled proximal Newtontype method for largescale machine learning
We propose a fast proximal Newtontype algorithm for minimizing regulari...
read it

Theoretical insights into the optimization landscape of overparameterized shallow neural networks
In this paper we study the problem of learning a shallow artificial neur...
read it

Gradient Descent Can Take Exponential Time to Escape Saddle Points
Although gradient descent (GD) almost always escapes saddle points asymp...
read it

A Flexible Framework for Hypothesis Testing in Highdimensions
Hypothesis testing in the linear regression model is a fundamental stati...
read it

Statistical Inference for Model Parameters in Stochastic Gradient Descent
The stochastic gradient descent (SGD) algorithm has been widely used in ...
read it

Blackbox Importance Sampling
Importance sampling is widely used in machine learning and statistics, b...
read it

Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and Highdimensional Data
Sketching techniques have become popular for scaling up machine learning...
read it

CommunicationEfficient Distributed Statistical Inference
We present a Communicationefficient Surrogate Likelihood (CSL) framewor...
read it

Matrix Completion has No Spurious Local Minimum
Matrix completion is a basic machine learning problem that has wide appl...
read it

Gradient Descent Converges to Minimizers
We show that gradient descent converges to a local minimizer, almost sur...
read it

A Kernelized Stein Discrepancy for Goodnessoffit Tests and Model Evaluation
We derive a new discrepancy statistic for measuring differences between ...
read it

Selective Inference and Learning Mixed Graphical Models
This thesis studies two problems in modern statistics. First, we study s...
read it

Communicationefficient sparse regression: a oneshot approach
We devise a oneshot approach to distributed sparse regression in the hi...
read it
Jason D. Lee
is this you? claim profile
Assistant Professor, Data Science and Operations Department, Marshall School of Business, University of Southern California.