
-
Byzantine-Resilient Non-Convex Stochastic Gradient Descent
We study adversary-resilient stochastic distributed optimization, in whi...
read it
-
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
We formally study how Ensemble of deep learning models can improve test ...
read it
-
Feature Purification: How Adversarial Training Performs Robust Deep Learning
Despite the great empirical success of adversarial training to defend de...
read it
-
Backward Feature Correction: How Deep Learning Performs Deep Learning
How does a 110-layer ResNet learn a high-complexity classifier using rel...
read it
-
What Can ResNet Learn Efficiently, Going Beyond Kernels?
How can neural networks such as ResNet efficiently learn CIFAR-10 with t...
read it
-
Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Recurrent Neural Networks (RNNs) are among the most popular models in se...
read it
-
The Lingering of Gradients: How to Reuse Gradients over Time
Classically, the time complexity of a first-order method is estimated by...
read it
-
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Neural networks have great success in many machine learning applications...
read it
-
A Convergence Theory for Deep Learning via Over-Parameterization
Deep neural networks (DNNs) have demonstrated dominating performance in ...
read it
-
On the Convergence Rate of Training Recurrent Neural Networks
Despite the huge success of deep learning, our understanding to how the ...
read it
-
Is Q-learning Provably Efficient?
Model-free reinforcement learning (RL) algorithms, such as Q-learning, d...
read it
-
Operator Scaling via Geodesically Convex Optimization, Invariant Theory and Polynomial Identity Testing
We propose a new second-order method for geodesically convex optimizatio...
read it
-
Byzantine Stochastic Gradient Descent
This paper studies the problem of distributed stochastic optimization in...
read it
-
Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization
The problem of minimizing sum-of-nonconvex functions (i.e., convex funct...
read it
-
Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits
Regret bounds in online learning compare the player's performance to L^*...
read it
-
How To Make the Gradients Small Stochastically
In convex stochastic optimization, convergence rates in terms of minimiz...
read it
-
Neon2: Finding Local Minima via First-Order Oracles
We propose a reduction for non-convex optimization that can (1) turn a s...
read it
-
Near-Optimal Discrete Optimization for Experimental Design: A Regret Minimization Approach
The experimental design problem concerns the selection of k points from ...
read it
-
Natasha 2: Faster Non-Convex Optimization Than SGD
We design a stochastic algorithm to train any smooth neural network to ε...
read it
-
Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls
We propose a rank-k variant of the classical Frank-Wolfe algorithm to so...
read it
-
Natasha: Faster Non-Convex Stochastic Optimization Via Strongly Non-Convex Parameter
Given a nonconvex function f(x) that is an average of n smooth functions...
read it
-
Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU
The online problem of computing the top eigenvector is fundamental to ma...
read it
-
Finding Approximate Local Minima Faster than Gradient Descent
We design a non-convex second-order optimization algorithm that is guara...
read it
-
Faster Principal Component Regression and Stable Matrix Chebyshev Approximation
We solve principal component regression (PCR), up to a multiplicative ac...
read it
-
First Efficient Convergence for Streaming k-PCA: a Global, Gap-Free, and Near-Optimal Rate
We study streaming principal component analysis (PCA), that is to find, ...
read it
-
Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition
We study k-GenEV, the problem of finding the top k generalized eigenvect...
read it
-
LazySVD: Even Faster SVD Decomposition Yet Without Agonizing Pain
We study k-SVD that is to obtain the first k singular vectors of a matri...
read it
-
Katyusha: The First Direct Acceleration of Stochastic Gradient Methods
Nesterov's momentum trick is famously known for accelerating gradient de...
read it
-
Variance Reduction for Faster Non-Convex Optimization
We consider the fundamental problem in non-convex optimization of effici...
read it
-
Optimal Black-Box Reductions Between Optimization Objectives
The diverse world of machine learning applications has given rise to a p...
read it
-
Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters
The amount of data available in the world is growing faster than our abi...
read it
-
Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling
Accelerated coordinate descent is widely used in optimization due to its...
read it
-
Spectral Sparsification and Regret Minimization Beyond Matrix Multiplicative Updates
In this paper, we provide a novel construction of the linear-sized spect...
read it
-
Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives
Many classical algorithms are found until several years later to outlive...
read it
-
Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent
First-order methods play a central role in large-scale machine learning....
read it
-
Flow-Based Algorithms for Local Graph Clustering
Given a subset S of vertices of an undirected graph G, the cut-improveme...
read it
-
Local Graph Clustering Beyond Cheeger's Inequality
Motivated by applications of large-scale graph clustering, we study rand...
read it