
BNinvariant sharpness regularizes the training model to better generalization
It is arguably believed that flatter minima can generalize better. Howev...
read it

WellConditioned Methods for IllConditioned Systems: Linear Regression with SemiRandom Noise
Classical iterative algorithms for linear system solving and regression ...
read it

Membership Inference with Privately Augmented Data Endorses the Benign while Suppresses the Adversary
Membership inference (MI) in machine learning decides whether a given ex...
read it

Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia
Adaptive Momentum Estimation (Adam), which combines Adaptive Learning Ra...
read it

On Layer Normalization in the Transformer Architecture
The Transformer is widely used in natural language processing tasks. To ...
read it

Gradient Perturbation is Underrated for Differentially Private Convex Optimization
Gradient perturbation, widely used for differentially private optimizati...
read it

Convergence of Distributed Stochastic Variance Reduced Methods without Sampling Extra Data
Stochastic variance reduced methods have gained a lot of interest recent...
read it

Training Overparameterized Deep ResNet Is almost as Easy as Training a Twolayer Network
It has been proved that gradient descent converges linearly to the globa...
read it

SGD Converges to Global Minimum in Deep Learning via Starconvex Path
Stochastic gradient descent (SGD) has been found to be surprisingly effe...
read it

Capacity Control of ReLU Neural Networks by Basispath Norm
Recently, path norm was proposed as a new capacity measure for neural ne...
read it

Train Feedfoward Neural Network with Layerwise Adaptive Rate via Approximating Backmatching Propagation
Stochastic gradient descent (SGD) has achieved great success in training...
read it

Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization
The success of deep learning has led to a rising interest in the general...
read it

Blockdiagonal Hessianfree Optimization for Training Neural Networks
Secondorder methods for neural network optimization have several advant...
read it

Nonconvex LowRank Matrix Recovery with Arbitrary Outliers via MedianTruncated Gradient Descent
Recent work has demonstrated the effectiveness of gradient descent for d...
read it

Reshaped Wirtinger Flow and Incremental Algorithm for Solving Quadratic System of Equations
We study the phase retrieval problem, which solves quadratic system of e...
read it

MedianTruncated Nonconvex Approach for Phase Retrieval with Outliers
This paper investigates the phase retrieval problem, which aims to recov...
read it
Huishuai Zhang
is this you? claim profile