Noise-contrastive estimation is a sampling loss typically used to train classifiers with a large output vocabulary. Calculating the softmax over a large number of possible classes is prohibitively expensive. Using NCE, we can reduce the problem to binary classification problem by training the classifier to discriminate between samples from the “real” distribution and an artificially generated noise distribution.
