Convergence of SGD in Learning ReLU Models with Separable Data

06/12/2018
by   Tengyu Xu, et al.
0

We consider the binary classification problem in which the objective function is the exponential loss with a ReLU model, and study the convergence property of the stochastic gradient descent (SGD) algorithm on linearly separable data. We show that the gradient descent (GD) algorithm do not always learn desirable model parameters due to the nonlinear ReLU model. Then, we identify a certain condition of data samples, under which we show that SGD can learn a proper classifier with implicit bias. In specific, we establish the sub-linear convergence rate of the function value generated by SGD to global minimum. We further show that SGD actually converges in expectation to the maximum margin classifier with respect to the samples with +1 label under the ReLU model at the rate O(1/ln t). We also extend our study to the case of multi-ReLU neurons, and show that SGD converges to a certain non-linear maximum margin classifier for a class of non-linearly separable data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2018

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate

Stochastic Gradient Descent (SGD) is a central tool in machine learning....
research
08/14/2018

Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization

Neural networks with ReLU activations have achieved great empirical succ...
research
10/20/2018

Condition Number Analysis of Logistic Regression, and its Implications for Standard First-Order Solution Methods

Logistic regression is one of the most popular methods in binary classif...
research
05/12/2023

Online Learning Under A Separable Stochastic Approximation Framework

We propose an online learning algorithm for a class of machine learning ...
research
07/07/2020

Understanding the Impact of Model Incoherence on Convergence of Incremental SGD with Random Reshuffle

Although SGD with random reshuffle has been widely-used in machine learn...
research
08/15/2021

Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data

Bregman proximal point algorithm (BPPA), as one of the centerpieces in t...
research
09/09/2018

Stochastic Gradient Descent Learns State Equations with Nonlinear Activations

We study discrete time dynamical systems governed by the state equation ...

Please sign up or login with your details

Forgot password? Click here to reset