An Over-parameterized Exponential Regression

03/29/2023
by   Yeqi Gao, et al.
0

Over the past few years, there has been a significant amount of research focused on studying the ReLU activation function, with the aim of achieving neural network convergence through over-parametrization. However, recent developments in the field of Large Language Models (LLMs) have sparked interest in the use of exponential activation functions, specifically in the attention mechanism. Mathematically, we define the neural function F: ℝ^d × m×ℝ^d →ℝ using an exponential activation function. Given a set of data points with labels {(x_1, y_1), (x_2, y_2), …, (x_n, y_n)}⊂ℝ^d ×ℝ where n denotes the number of the data. Here F(W(t),x) can be expressed as F(W(t),x) := ∑_r=1^m a_r exp(⟨ w_r, x ⟩), where m represents the number of neurons, and w_r(t) are weights at time t. It's standard in literature that a_r are the fixed weights and it's never changed during the training. We initialize the weights W(0) ∈ℝ^d × m with random Gaussian distributions, such that w_r(0) ∼𝒩(0, I_d) and initialize a_r from random sign distribution for each r ∈ [m]. Using the gradient descent algorithm, we can find a weight W(T) such that F(W(T), X) - y _2 ≤ϵ holds with probability 1-δ, where ϵ∈ (0,0.1) and m = Ω(n^2+o(1)log(n/δ)). To optimize the over-parameterization bound m, we employ several tight analysis techniques from previous studies [Song and Yang arXiv 2019, Munteanu, Omlor, Song and Woodruff ICML 2022].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2019

Activation Adaptation in Neural Networks

Many neural network architectures rely on the choice of the activation f...
research
01/07/2019

On the effect of the activation function on the distribution of hidden nodes in a deep network

We analyze the joint probability distribution on the lengths of the vect...
research
06/30/2022

Consensus Function from an L_p^q-norm Regularization Term for its Use as Adaptive Activation Functions in Neural Networks

The design of a neural network is usually carried out by defining the nu...
research
10/21/2022

Stochastic Adaptive Activation Function

The simulation of human neurons and neurotransmission mechanisms has bee...
research
08/28/2017

A parameterized activation function for learning fuzzy logic operations in deep neural networks

We present a deep learning architecture for learning fuzzy logic express...
research
02/20/2018

On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition

We establish connections between the problem of learning a two-layers ne...
research
11/06/2019

A Numerical Study of the Time of Extinction in a Class of Systems of Spiking Neurons

In this paper we present a numerical study of a mathematical model of sp...

Please sign up or login with your details

Forgot password? Click here to reset