r-softmax: Generalized Softmax with Controllable Sparsity Rate

04/11/2023
by   Klaudia Bałazy, et al.
0

Nowadays artificial neural network models achieve remarkable results in many disciplines. Functions mapping the representation provided by the model to the probability distribution are the inseparable aspect of deep learning solutions. Although softmax is a commonly accepted probability mapping function in the machine learning community, it cannot return sparse outputs and always spreads the positive probability to all positions. In this paper, we propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate. In contrast to the existing sparse probability mapping functions, we provide an intuitive mechanism for controlling the output sparsity level. We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax. We also apply r-softmax to the self-attention module of a pre-trained transformer language model and demonstrate that it leads to improved performance when fine-tuning the model on different natural language processing tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2018

On Controllable Sparse Alternatives to Softmax

Converting an n-dimensional vector to a probability distribution over n ...
research
11/23/2020

Exploring Alternatives to Softmax Function

Softmax function is widely used in artificial neural networks for multic...
research
02/05/2016

From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

We propose sparsemax, a new activation function similar to the tradition...
research
11/23/2020

Effectiveness of MPC-friendly Softmax Replacement

Softmax is widely used in deep learning to map some representation to a ...
research
11/12/2021

Speeding Up Entmax

Softmax is the de facto standard in modern neural networks for language ...
research
10/06/2022

To Softmax, or not to Softmax: that is the question when applying Active Learning for Transformer Models

Despite achieving state-of-the-art results in nearly all Natural Languag...
research
03/15/2020

Analysis of Softmax Approximation for Deep Classifiers under Input-Dependent Label Noise

Modelling uncertainty arising from input-dependent label noise is an inc...

Please sign up or login with your details

Forgot password? Click here to reset