
LoRA: LowRank Adaptation of Large Language Models
The dominant paradigm of natural language processing consists of larges...
read it

Forward SuperResolution: How Can GANs Learn Hierarchical Generative Models for RealWorld Distributions
Generative adversarial networks (GANs) are among the most successful mod...
read it

ByzantineResilient NonConvex Stochastic Gradient Descent
We study adversaryresilient stochastic distributed optimization, in whi...
read it

Towards Understanding Ensemble, Knowledge Distillation and SelfDistillation in Deep Learning
We formally study how Ensemble of deep learning models can improve test ...
read it

Feature Purification: How Adversarial Training Performs Robust Deep Learning
Despite the great empirical success of adversarial training to defend de...
read it

Backward Feature Correction: How Deep Learning Performs Deep Learning
How does a 110layer ResNet learn a highcomplexity classifier using rel...
read it

What Can ResNet Learn Efficiently, Going Beyond Kernels?
How can neural networks such as ResNet efficiently learn CIFAR10 with t...
read it

Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Recurrent Neural Networks (RNNs) are among the most popular models in se...
read it

The Lingering of Gradients: How to Reuse Gradients over Time
Classically, the time complexity of a firstorder method is estimated by...
read it

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Neural networks have great success in many machine learning applications...
read it

A Convergence Theory for Deep Learning via OverParameterization
Deep neural networks (DNNs) have demonstrated dominating performance in ...
read it

On the Convergence Rate of Training Recurrent Neural Networks
Despite the huge success of deep learning, our understanding to how the ...
read it

Is Qlearning Provably Efficient?
Modelfree reinforcement learning (RL) algorithms, such as Qlearning, d...
read it

Operator Scaling via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing
We propose a new secondorder method for geodesically convex optimizatio...
read it

Byzantine Stochastic Gradient Descent
This paper studies the problem of distributed stochastic optimization in...
read it

Katyusha X: Practical Momentum Method for Stochastic SumofNonconvex Optimization
The problem of minimizing sumofnonconvex functions (i.e., convex funct...
read it

Make the Minority Great Again: FirstOrder Regret Bound for Contextual Bandits
Regret bounds in online learning compare the player's performance to L^*...
read it

How To Make the Gradients Small Stochastically
In convex stochastic optimization, convergence rates in terms of minimiz...
read it

Neon2: Finding Local Minima via FirstOrder Oracles
We propose a reduction for nonconvex optimization that can (1) turn a s...
read it

NearOptimal Discrete Optimization for Experimental Design: A Regret Minimization Approach
The experimental design problem concerns the selection of k points from ...
read it

Natasha 2: Faster NonConvex Optimization Than SGD
We design a stochastic algorithm to train any smooth neural network to ε...
read it

Linear Convergence of a FrankWolfe Type Algorithm over TraceNorm Balls
We propose a rankk variant of the classical FrankWolfe algorithm to so...
read it

Natasha: Faster NonConvex Stochastic Optimization Via Strongly NonConvex Parameter
Given a nonconvex function f(x) that is an average of n smooth functions...
read it

Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU
The online problem of computing the top eigenvector is fundamental to ma...
read it

Finding Approximate Local Minima Faster than Gradient Descent
We design a nonconvex secondorder optimization algorithm that is guara...
read it

Faster Principal Component Regression and Stable Matrix Chebyshev Approximation
We solve principal component regression (PCR), up to a multiplicative ac...
read it

First Efficient Convergence for Streaming kPCA: a Global, GapFree, and NearOptimal Rate
We study streaming principal component analysis (PCA), that is to find, ...
read it

Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition
We study kGenEV, the problem of finding the top k generalized eigenvect...
read it

LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain
We study kSVD that is to obtain the first k singular vectors of a matri...
read it

Katyusha: The First Direct Acceleration of Stochastic Gradient Methods
Nesterov's momentum trick is famously known for accelerating gradient de...
read it

Variance Reduction for Faster NonConvex Optimization
We consider the fundamental problem in nonconvex optimization of effici...
read it

Optimal BlackBox Reductions Between Optimization Objectives
The diverse world of machine learning applications has given rise to a p...
read it

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters
The amount of data available in the world is growing faster than our abi...
read it

Even Faster Accelerated Coordinate Descent Using NonUniform Sampling
Accelerated coordinate descent is widely used in optimization due to its...
read it

Spectral Sparsification and Regret Minimization Beyond Matrix Multiplicative Updates
In this paper, we provide a novel construction of the linearsized spect...
read it

Improved SVRG for NonStronglyConvex or SumofNonConvex Objectives
Many classical algorithms are found until several years later to outlive...
read it

Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent
Firstorder methods play a central role in largescale machine learning....
read it

FlowBased Algorithms for Local Graph Clustering
Given a subset S of vertices of an undirected graph G, the cutimproveme...
read it

Local Graph Clustering Beyond Cheeger's Inequality
Motivated by applications of largescale graph clustering, we study rand...
read it
Zeyuan AllenZhu
is this you? claim profile
Researcher in Machine Learning and Optimization group at Massachusetts Institute of Technology, Researcherat Microsoft