Understanding Implicit Regularization in Over-Parameterized Nonlinear Statistical Model

by   Jianqing Fan, et al.

We study the implicit regularization phenomenon induced by simple optimization algorithms in over-parameterized nonlinear statistical models. Specifically, we study both vector and matrix single index models where the link function is nonlinear and unknown, the signal parameter is either a sparse vector or a low-rank symmetric matrix, and the response variable can be heavy-tailed. To gain a better understanding the role of implicit regularization in the nonlinear models without excess technicality, we assume that the distribution of the covariates is known as a priori. For both the vector and matrix settings, we construct an over-parameterized least-squares loss function by employing the score function transform and a robust truncation step designed specifically for heavy-tailed data. We propose to estimate the true parameter by applying regularization-free gradient descent to the loss function. When the initialization is close to the origin and the stepsize is sufficiently small, we prove that the obtained solution achieves minimax optimal statistical rates of convergence in both the vector and matrix cases. In particular, for the vector single index model with Gaussian covariates, our proposed estimator is shown to enjoy the oracle statistical rate. Our results capture the implicit regularization phenomenon in over-parameterized nonlinear and noisy statistical models with possibly heavy-tailed data.


page 1

page 2

page 3

page 4


Implicit Regularization of Sub-Gradient Method in Robust Matrix Recovery: Don't be Afraid of Outliers

It is well-known that simple short-sighted algorithms, such as gradient ...

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

We investigate the role of noise in optimization algorithms for learning...

Robust Matrix Completion with Heavy-tailed Noise

This paper studies low-rank matrix completion in the presence of heavy-t...

Stochastic Gradient/Mirror Descent: Minimax Optimality and Implicit Regularization

Stochastic descent methods (of the gradient and mirror varieties) have b...

Tractably Modelling Dependence in Networks Beyond Exchangeability

We propose a general framework for modelling network data that is design...

Convergence Analysis of the Dynamics of a Special Kind of Two-Layered Neural Networks with ℓ_1 and ℓ_2 Regularization

In this paper, we made an extension to the convergence analysis of the d...

Projected Gradient Descent Algorithms for Solving Nonlinear Inverse Problems with Generative Priors

In this paper, we propose projected gradient descent (PGD) algorithms fo...

Please sign up or login with your details

Forgot password? Click here to reset