Speeding Up Entmax

11/12/2021
by   Maxat Tezekbayev, et al.
0

Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being selected at each generation step, leading to a variety of reported problems in text generation. α-entmax of Peters et al. (2019, arXiv:1905.05702) solves this problem, but is considerably slower than softmax. In this paper, we propose an alternative to α-entmax, which keeps its virtuous characteristics, but is as fast as optimized softmax and achieves on par or better performance in machine translation task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2020

F^2-Softmax: Diversifying Neural Text Generation via Frequency Factorized Softmax

Despite recent advances in neural text generation, encoding the rich div...
research
11/01/2019

Kernelized Bayesian Softmax for Text Generation

Neural models for text generation require a softmax layer with proper to...
research
01/01/2021

A Graph Total Variation Regularized Softmax for Text Generation

The softmax operator is one of the most important functions in machine l...
research
05/08/2021

Neural Text Generation with Part-of-Speech Guided Softmax

Neural text generation models are likely to suffer from the low-diversit...
research
12/10/2018

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs

The Softmax function is used in the final layer of nearly all existing s...
research
11/23/2020

Effectiveness of MPC-friendly Softmax Replacement

Softmax is widely used in deep learning to map some representation to a ...
research
04/11/2023

r-softmax: Generalized Softmax with Controllable Sparsity Rate

Nowadays artificial neural network models achieve remarkable results in ...

Please sign up or login with your details

Forgot password? Click here to reset