
BNinvariant sharpness regularizes the training model to better generalization
It is arguably believed that flatter minima can generalize better. Howev...
WellConditioned Methods for IllConditioned Systems: Linear Regression with SemiRandom Noise
Classical iterative algorithms for linear system solving and regression ...
Membership Inference with Privately Augmented Data Endorses the Benign while Suppresses the Adversary
Membership inference (MI) in machine learning decides whether a given ex...
Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia
Adaptive Momentum Estimation (Adam), which combines Adaptive Learning Ra...
On Layer Normalization in the Transformer Architecture
The Transformer is widely used in natural language processing tasks. To ...
Gradient Perturbation is Underrated for Differentially Private Convex Optimization
Gradient perturbation, widely used for differentially private optimizati...
Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data
Stochastic variance reduced methods have gained a lot of interest recent...
Training Overparameterized Deep ResNet Is almost as Easy as Training a Twolayer Network
It has been proved that gradient descent converges linearly to the globa...
SGD Converges to Global Minimum in Deep Learning via Starconvex Path
Stochastic gradient descent (SGD) has been found to be surprisingly effe...
Capacity Control of ReLU Neural Networks by Basispath Norm
Recently, path norm was proposed as a new capacity measure for neural ne...
Train Feedfoward Neural Network with Layerwise Adaptive Rate via Approximating Backmatching Propagation
Stochastic gradient descent (SGD) has achieved great success in training...
Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization
The success of deep learning has led to a rising interest in the general...
Blockdiagonal Hessianfree Optimization for Training Neural Networks
Secondorder methods for neural network optimization have several advant...
Nonconvex LowRank Matrix Recovery with Arbitrary Outliers via MedianTruncated Gradient Descent
Recent work has demonstrated the effectiveness of gradient descent for d...
Reshaped Wirtinger Flow and Incremental Algorithm for Solving Quadratic System of Equations
We study the phase retrieval problem, which solves quadratic system of e...
MedianTruncated Nonconvex Approach for Phase Retrieval with Outliers
This paper investigates the phase retrieval problem, which aims to recov...
Huishuai Zhang
