On the Stability and Generalization of Learning with Kernel Activation Functions

03/28/2019
by   Michele Cirillo, et al.
0

In this brief we investigate the generalization properties of a recently-proposed class of non-parametric activation functions, the kernel activation functions (KAFs). KAFs introduce additional parameters in the learning process in order to adapt nonlinearities individually on a per-neuron basis, exploiting a cheap kernel expansion of every activation value. While this increase in flexibility has been shown to provide significant improvements in practice, a theoretical proof for its generalization capability has not been addressed yet in the literature. Here, we leverage recent literature on the stability properties of non-convex models trained via stochastic gradient descent (SGD). By indirectly proving two key smoothness properties of the models under consideration, we prove that neural networks endowed with KAFs generalize well when trained with SGD for a finite number of steps. Interestingly, our analysis provides a guideline for selecting one of the hyper-parameters of the model, the bandwidth of the scalar Gaussian kernel. A short experimental evaluation validates the proof.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2017

Kafnets: kernel-based non-parametric activation functions for neural networks

Neural networks are generally built by interleaving (adaptable) linear l...
research
02/06/2019

Widely Linear Kernels for Complex-Valued Kernel Activation Functions

Complex-valued neural networks (CVNNs) have been shown to be powerful no...
research
02/22/2018

Complex-valued Neural Networks with Non-parametric Activation Functions

Complex-valued neural networks (CVNNs) are a powerful modeling tool for ...
research
01/29/2019

Multikernel activation functions: formulation and a case study

The design of activation functions is a growing research area in the fie...
research
11/29/2017

Gaussian Process Neurons Learn Stochastic Activation Functions

We propose stochastic, non-parametric activation functions that are full...
research
02/14/2019

A Broad Class of Discrete-Time Hypercomplex-Valued Hopfield Neural Networks

In this paper, we address the stability of a broad class of discrete-tim...
research
11/21/2020

Central and Non-central Limit Theorems arising from the Scattering Transform and its Neural Activation Generalization

Motivated by analyzing complicated and non-stationary time series, we st...

Please sign up or login with your details

Forgot password? Click here to reset