On Controllable Sparse Alternatives to Softmax

10/29/2018
by   Anirban Laha, et al.
0

Converting an n-dimensional vector to a probability distribution over n objects is a commonly used component in many machine learning tasks like multiclass classification, multilabel classification, attention mechanisms etc. For this, several probability mapping functions have been proposed and employed in literature such as softmax, sum-normalization, spherical softmax, and sparsemax, but there is very little understanding in terms how they relate with each other. Further, none of the above formulations offer an explicit control over the degree of sparsity. To address this, we develop a unified framework that encompasses all these formulations as special cases. This framework ensures simple closed-form solutions and existence of sub-gradients suitable for learning via backpropagation. Within this framework, we propose two novel sparse formulations, sparsegen-lin and sparsehourglass, that seek to provide a control over the degree of desired sparsity. We further develop novel convex loss functions that help induce the behavior of aforementioned formulations in the multilabel classification setting, showing improved performance. We also demonstrate empirically that the proposed formulations, when used to compute attention weights, achieve better or comparable performance on standard seq2seq tasks like neural machine translation and abstractive summarization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2023

r-softmax: Generalized Softmax with Controllable Sparsity Rate

Nowadays artificial neural network models achieve remarkable results in ...
research
05/22/2017

A Regularized Framework for Sparse and Structured Neural Attention

Modern neural networks are often augmented with an attention mechanism, ...
research
04/14/2021

Sparse Attention with Linear Units

Recently, it has been argued that encoder-decoder models can be made mor...
research
11/16/2015

An Exploration of Softmax Alternatives Belonging to the Spherical Loss Family

In a multi-class classification problem, it is standard to model the out...
research
01/30/2019

Doubly Sparse: Sparse Mixture of Sparse Experts for Efficient Softmax Inference

Computations for the softmax function are significantly expensive when t...
research
06/12/2020

Sparse and Continuous Attention Mechanisms

Exponential families are widely used in machine learning; they include m...
research
06/26/2016

Exact gradient updates in time independent of output size for the spherical loss family

An important class of problems involves training deep neural networks wi...

Please sign up or login with your details

Forgot password? Click here to reset