Effectiveness of MPC-friendly Softmax Replacement

11/23/2020
by   Marcel Keller, et al.
0

Softmax is widely used in deep learning to map some representation to a probability distribution. As it is based on exp/log functions that is relatively expensive in multi-party computation, Mohassel and Zhang (2017) proposed a simpler replacement based on ReLU to be used in secure computation. However, we could not reproduce the accuracy they reported for training on MNIST with three fully connected layers. Later works (e.g., Wagh et al., 2019 and 2021) used the softmax replacement not for computing the output probability distribution but for approximating the gradient in back-propagation. In this work, we analyze the two uses of the replacement and compare them to softmax, both in terms of accuracy and cost in multi-party computation. We found that the replacement only provides a significant speed-up for a one-layer network while it always reduces accuracy, sometimes significantly. Thus we conclude that its usefulness is limited and one should use the original softmax function instead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2018

Two monads for graphs

An introduction to algebras for graphs, based on Courcelle's algebras of...
research
12/13/2018

Effectiveness of Hierarchical Softmax in Large Scale Classification Tasks

Typically, Softmax is used in the final layer of a neural network to get...
research
04/11/2023

r-softmax: Generalized Softmax with Controllable Sparsity Rate

Nowadays artificial neural network models achieve remarkable results in ...
research
11/12/2021

Speeding Up Entmax

Softmax is the de facto standard in modern neural networks for language ...
research
07/24/2019

Sampled Softmax with Random Fourier Features

The computational cost of training with softmax cross entropy loss grows...
research
07/17/2023

Zero-th Order Algorithm for Softmax Attention Optimization

Large language models (LLMs) have brought about significant transformati...
research
05/25/2019

Cold Case: The Lost MNIST Digits

Although the popular MNIST dataset [LeCun et al., 1994] is derived from ...

Please sign up or login with your details

Forgot password? Click here to reset