Learning specialized activation functions with the Piecewise Linear Unit

04/08/2021

∙

The choice of activation functions is crucial for modern deep neural networks. Popular hand-designed activation functions like Rectified Linear Unit(ReLU) and its variants show promising performance in various tasks and models. Swish, the automatically discovered activation function, has been proposed and outperforms ReLU on many challenging datasets. However, it has two main drawbacks. First, the tree-based search space is highly discrete and restricted, which is difficult for searching. Second, the sample-based searching method is inefficient, making it infeasible to find specialized activation functions for each dataset or neural architecture. To tackle these drawbacks, we propose a new activation function called Piecewise Linear Unit(PWLU), which incorporates a carefully designed formulation and learning method. It can learn specialized activation functions and achieves SOTA performance on large-scale datasets like ImageNet and COCO. For example, on ImageNet classification dataset, PWLU improves 0.9 accuracy over Swish for ResNet-18/ResNet-50/MobileNet-V2/MobileNet-V3/EfficientNet-B0. PWLU is also easy to implement and efficient at inference, which can be widely applied in real-world applications.

READ FULL TEXT

Learning specialized activation functions with the Piecewise Linear Unit

Searching for Activation Functions

Weighted Sigmoid Gate Unit for an Activation Function of Deep Neural Network

Piecewise Linear Units Improve Deep Neural Networks

Amplifying Sine Unit: An Oscillatory Activation Function for Deep Neural Networks to Recover Nonlinear Oscillations Efficiently

Learning Activation Functions for Sparse Neural Networks

Dual Graphs of Polyhedral Decompositions for the Detection of Adversarial Attacks

An Attention-Gated Convolutional Neural Network for Sentence Classification

Learning specialized activation functions with the Piecewise Linear Unit

Related Research

Searching for Activation Functions

Weighted Sigmoid Gate Unit for an Activation Function of Deep Neural Network

Piecewise Linear Units Improve Deep Neural Networks

Amplifying Sine Unit: An Oscillatory Activation Function for Deep Neural Networks to Recover Nonlinear Oscillations Efficiently

Learning Activation Functions for Sparse Neural Networks

Dual Graphs of Polyhedral Decompositions for the Detection of Adversarial Attacks

An Attention-Gated Convolutional Neural Network for Sentence Classification