Zeyuan Allen-Zhu

research

∙ 06/20/2023

SALSA VERDE: a machine learning attack on Learning With Errors with sparse small secrets

Learning with Errors (LWE) is a hard math problem used in post-quantum c...

0 Cathy Li, et al. ∙

research

∙ 05/23/2023

Physics of Language Models: Part 1, Context-Free Grammar

We design experiments to study how generative language models, like GPT,...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 06/17/2021

LoRA: Low-Rank Adaptation of Large Language Models

The dominant paradigm of natural language processing consists of large-s...

13 Edward J. Hu, et al. ∙

research

∙ 06/04/2021

Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions

Generative adversarial networks (GANs) are among the most successful mod...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 12/28/2020

Byzantine-Resilient Non-Convex Stochastic Gradient Descent

We study adversary-resilient stochastic distributed optimization, in whi...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 12/17/2020

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning

We formally study how Ensemble of deep learning models can improve test ...

78 Zeyuan Allen-Zhu, et al. ∙

research

∙ 05/20/2020

Feature Purification: How Adversarial Training Performs Robust Deep Learning

Despite the great empirical success of adversarial training to defend de...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 01/13/2020

Backward Feature Correction: How Deep Learning Performs Deep Learning

How does a 110-layer ResNet learn a high-complexity classifier using rel...

5 Zeyuan Allen-Zhu, et al. ∙

research

∙ 05/24/2019

What Can ResNet Learn Efficiently, Going Beyond Kernels?

How can neural networks such as ResNet efficiently learn CIFAR-10 with t...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 02/04/2019

Can SGD Learn Recurrent Neural Networks with Provable Generalization?

Recurrent Neural Networks (RNNs) are among the most popular models in se...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 01/09/2019

The Lingering of Gradients: How to Reuse Gradients over Time

Classically, the time complexity of a first-order method is estimated by...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 11/12/2018

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Neural networks have great success in many machine learning applications...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 11/09/2018

A Convergence Theory for Deep Learning via Over-Parameterization

Deep neural networks (DNNs) have demonstrated dominating performance in ...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 10/29/2018

On the Convergence Rate of Training Recurrent Neural Networks

Despite the huge success of deep learning, our understanding to how the ...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 07/10/2018

Is Q-learning Provably Efficient?

Model-free reinforcement learning (RL) algorithms, such as Q-learning, d...

0 Chi Jin, et al. ∙

research

∙ 04/03/2018

Operator Scaling via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing

We propose a new second-order method for geodesically convex optimizatio...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 03/23/2018

Byzantine Stochastic Gradient Descent

This paper studies the problem of distributed stochastic optimization in...

0 Dan Alistarh, et al. ∙

research

∙ 02/12/2018

Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization

The problem of minimizing sum-of-nonconvex functions (i.e., convex funct...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 02/09/2018

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits

Regret bounds in online learning compare the player's performance to L^*...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 01/08/2018

How To Make the Gradients Small Stochastically

In convex stochastic optimization, convergence rates in terms of minimiz...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 11/17/2017

Neon2: Finding Local Minima via First-Order Oracles

We propose a reduction for non-convex optimization that can (1) turn a s...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 11/14/2017

Near-Optimal Discrete Optimization for Experimental Design: A Regret Minimization Approach

The experimental design problem concerns the selection of k points from ...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 08/29/2017

Natasha 2: Faster Non-Convex Optimization Than SGD

We design a stochastic algorithm to train any smooth neural network to ε...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 08/07/2017

Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls

We propose a rank-k variant of the classical Frank-Wolfe algorithm to so...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 02/02/2017

Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter

Given a nonconvex function f(x) that is an average of n smooth functions...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 01/06/2017

Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU

The online problem of computing the top eigenvector is fundamental to ma...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 11/03/2016

Finding Approximate Local Minima Faster than Gradient Descent

We design a non-convex second-order optimization algorithm that is guara...

0 Naman Agarwal, et al. ∙

research

∙ 08/16/2016

Faster Principal Component Regression and Stable Matrix Chebyshev Approximation

We solve principal component regression (PCR), up to a multiplicative ac...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 07/26/2016

First Efficient Convergence for Streaming k-PCA: a Global, Gap-Free, and Near-Optimal Rate

We study streaming principal component analysis (PCA), that is to find, ...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 07/20/2016

Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition

We study k-GenEV, the problem of finding the top k generalized eigenvect...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 07/12/2016

LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain

We study k-SVD that is to obtain the first k singular vectors of a matri...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 03/18/2016

Katyusha: The First Direct Acceleration of Stochastic Gradient Methods

Nesterov's momentum trick is famously known for accelerating gradient de...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 03/17/2016

Variance Reduction for Faster Non-Convex Optimization

We consider the fundamental problem in non-convex optimization of effici...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 03/17/2016

Optimal Black-Box Reductions Between Optimization Objectives

The diverse world of machine learning applications has given rise to a p...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 02/05/2016

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters

The amount of data available in the world is growing faster than our abi...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 12/30/2015

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling

Accelerated coordinate descent is widely used in optimization due to its...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 06/16/2015

Spectral Sparsification and Regret Minimization Beyond Matrix Multiplicative Updates

In this paper, we provide a novel construction of the linear-sized spect...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 06/05/2015

Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives

Many classical algorithms are found until several years later to outlive...

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 07/06/2014

Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent

First-order methods play a central role in large-scale machine learning....

0 Zeyuan Allen-Zhu, et al. ∙

research

∙ 07/10/2013

Flow-Based Algorithms for Local Graph Clustering

Given a subset S of vertices of an undirected graph G, the cut-improveme...

0 Lorenzo Orecchia, et al. ∙

research

∙ 04/30/2013

Local Graph Clustering Beyond Cheeger's Inequality

Motivated by applications of large-scale graph clustering, we study rand...

0 Zeyuan Allen-Zhu, et al. ∙

Zeyuan Allen-Zhu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro