Fast generalization error bound of deep learning without scale invariance of activation functions

07/25/2019
by   Yoshikazu Terada, et al.
0

In theoretical analysis of deep learning, discovering which features of deep learning lead to good performance is an important task. In this paper, using the framework for analyzing the generalization error developed in Suzuki (2018), we derive a fast learning rate for deep neural networks with more general activation functions. In Suzuki (2018), assuming the scale invariance of activation functions, the tight generalization error bound of deep learning was derived. They mention that the scale invariance of the activation function is essential to derive tight error bounds. Whereas the rectified linear unit (ReLU; Nair and Hinton, 2010) satisfies the scale invariance, the other famous activation functions including the sigmoid and the hyperbolic tangent functions, and the exponential linear unit (ELU; Clevert et al., 2016) does not satisfy this condition. The existing analysis indicates a possibility that a deep learning with the non scale invariant activations may have a slower convergence rate of O(1/√(n)) when one with the scale invariant activations can reach a rate faster than O(1/√(n)). In this paper, without the scale invariance of activation functions, we derive the tight generalization error bound which is essentially the same as that of Suzuki (2018). From this result, at least in the framework of Suzuki (2018), it is shown that the scale invariance of the activation functions is not essential to get the fast rate of convergence. Simultaneously, it is also shown that the theoretical framework proposed by Suzuki (2018) can be widely applied for analysis of deep learning with general activation functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2019

Smooth function approximation by deep neural networks with general activation functions

There has been a growing interest in expressivity of deep neural network...
research
06/17/2021

Orthogonal-Padé Activation Functions: Trainable Activation functions for smooth and faster convergence in deep networks

We have proposed orthogonal-Padé activation functions, which are trainab...
research
06/04/2020

Overcoming Overfitting and Large Weight Update Problem in Linear Rectifiers: Thresholded Exponential Rectified Linear Units

In past few years, linear rectified unit activation functions have shown...
research
11/17/2022

Multilayer Perceptron-based Surrogate Models for Finite Element Analysis

Many Partial Differential Equations (PDEs) do not have analytical soluti...
research
08/08/2022

On Rademacher Complexity-based Generalization Bounds for Deep Learning

In this paper, we develop some novel bounds for the Rademacher complexit...
research
03/19/2018

Deep learning improved by biological activation functions

`Biologically inspired' activation functions, such as the logistic sigmo...
research
10/18/2018

Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality

Deep learning has shown high performances in various types of tasks from...

Please sign up or login with your details

Forgot password? Click here to reset