A proof of convergence of multi-class logistic regression network

03/29/2019
by   Marek Rychlik, et al.
0

This paper revisits the special type of a neural network known under two names. In the statistics and machine learning community it is known as a multi-class logistic regression neural network. In the neural network community, it is simply the soft-max layer. The importance is underscored by its role in deep learning: as the last layer, whose autput is actually the classification of the input patterns, such as images. Our exposition focuses on mathematically rigorous derivation of the key equation expressing the gradient. The fringe benefit of our approach is a fully vectorized expression, which is a basis of an efficient implementation. The second result of this paper is the positivity of the second derivative of the cross-entropy loss function as function of the weights. This result proves that optimization methods based on convexity may be used to train this network. As a corollary, we demonstrate that no L^2-regularizer is needed to guarantee convergence of gradient descent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2017

The Implicit Bias of Gradient Descent on Separable Data

We show that gradient descent on an unregularized logistic regression pr...
research
12/08/2020

Convergence Rates for Multi-classs Logistic Regression Near Minimum

Training a neural network is typically done via variations of gradient d...
research
02/18/2018

Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression

We study the local geometry of a one-hidden-layer fully-connected neural...
research
09/17/2018

Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition

We study in this paper how to initialize the parameters of multinomial l...
research
05/12/2021

An efficient projection neural network for ℓ_1-regularized logistic regression

ℓ_1 regularization has been used for logistic regression to circumvent t...
research
01/27/2019

Large-Scale Classification using Multinomial Regression and ADMM

We present a novel method for learning the weights in multinomial logist...
research
07/20/2021

The Smoking Gun: Statistical Theory Improves Neural Network Estimates

In this paper we analyze the L_2 error of neural network regression esti...

Please sign up or login with your details

Forgot password? Click here to reset