Linear Oscillation: The Aesthetics of Confusion for Vision Transformer

08/25/2023
by   Juyoung Yun, et al.
0

Activation functions are the linchpins of deep learning, profoundly influencing both the representational capacity and training dynamics of neural networks. They shape not only the nature of representations but also optimize convergence rates and enhance generalization potential. Appreciating this critical role, we present the Linear Oscillation (LoC) activation function, defined as f(x) = x ×sin(α x + β). Distinct from conventional activation functions which primarily introduce non-linearity, LoC seamlessly blends linear trajectories with oscillatory deviations. The nomenclature “Linear Oscillation” is a nod to its unique attribute of infusing linear activations with harmonious oscillations, capturing the essence of the 'Importance of Confusion'. This concept of “controlled confusion” within network activations is posited to foster more robust learning, particularly in contexts that necessitate discerning subtle patterns. Our empirical studies reveal that, when integrated into diverse neural architectures, the LoC activation function consistently outperforms established counterparts like ReLU and Sigmoid. The stellar performance exhibited by the avant-garde Vision Transformer model using LoC further validates its efficacy. This study illuminates the remarkable benefits of the LoC over other prominent activation functions. It champions the notion that intermittently introducing deliberate complexity or “confusion” during training can spur more profound and nuanced learning. This accentuates the pivotal role of judiciously selected activation functions in shaping the future of neural network training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2023

Complexity of Neural Network Training and ETR: Extensions with Effectively Continuous Functions

We study the complexity of the problem of training neural networks defin...
research
06/23/2019

Learning Activation Functions: A new paradigm of understanding Neural Networks

There has been limited research in the domain of activation functions, m...
research
09/15/2023

Attention-Only Transformers and Implementing MLPs with Attention Heads

The transformer architecture is widely used in machine learning models a...
research
08/21/2021

SERF: Towards better training of deep neural networks using log-Softplus ERror activation Function

Activation functions play a pivotal role in determining the training dyn...
research
10/21/2022

Stochastic Adaptive Activation Function

The simulation of human neurons and neurotransmission mechanisms has bee...
research
06/24/2020

AReLU: Attention-based Rectified Linear Unit

Element-wise activation functions play a critical role in deep neural ne...
research
10/30/2019

Sparsely Activated Networks: A new method for decomposing and compressing data

Recent literature on unsupervised learning focused on designing structur...

Please sign up or login with your details

Forgot password? Click here to reset