Sampled Softmax with Random Fourier Features

07/24/2019
by   Ankit Singh Rawat, et al.
5

The computational cost of training with softmax cross entropy loss grows linearly with the number of classes. For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the gradient based on these classes, known as the sampled softmax method. However, the sampled softmax provides a biased estimate of the gradient unless the samples are drawn from the exact softmax distribution, which is again expensive to compute. Therefore, a widely employed practical approach (without theoretical justification) involves sampling from a simpler distribution in the hope of approximating the exact softmax distribution. In this paper, we develop the first theoretical understanding of the role that different sampling distributions play in determining the quality of sampled softmax. Motivated by our analysis and the work on kernel-based sampling, we propose the Random Fourier Softmax (RF-softmax) method that utilizes the powerful Random Fourier features to enable more efficient and accurate sampling from the (approximate) softmax distribution. We show that RF-softmax leads to low bias in estimation in terms of both the full softmax distribution and the full softmax gradient. Furthermore, the cost of RF-softmax scales only logarithmically with the number of classes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2020

A Constant-time Adaptive Negative Sampling

Softmax classifiers with a very large number of classes naturally occur ...
research
02/15/2020

Extreme Classification via Adversarial Softmax Approximation

Training a classifier over a large number of classes, known as 'extreme ...
research
04/10/2020

Efficient Sampled Softmax for Tensorflow

This short paper discusses an efficient implementation of sampled softma...
research
03/22/2018

Unbiased scalable softmax optimization

Recent neural network and language models rely on softmax distributions ...
research
01/26/2019

Money on the Table: Statistical information ignored by Softmax can improve classifier accuracy

Softmax is a standard final layer used in Neural Nets (NNs) to summarize...
research
11/23/2020

Effectiveness of MPC-friendly Softmax Replacement

Softmax is widely used in deep learning to map some representation to a ...
research
05/30/2022

Cache-Augmented Inbatch Importance Resampling for Training Recommender Retriever

Recommender retrievers aim to rapidly retrieve a fraction of items from ...

Please sign up or login with your details

Forgot password? Click here to reset