Softmax Layer

What is a Softmax Layer

A Softmax function is a type of squashing function. Squashing functions limit the output of the function into the range 0 to 1. This allows the output to be interpreted directly as a probability. Similarly, softmax functions are multi-class sigmoids, meaning they are used in determining probability of multiple classes at once. Since the outputs of a softmax function can be interpreted as a probability (i.e.they must sum to 1), a softmax layer is typically the final layer used in neural network functions. It is important to note that a softmax layer must have the same number of nodes as the output later.

Softmax Layers in Machine Learning

A neural network may be attempting to determine if there is a dog in an image. It may be able to produce a probability that a dog is, or is not, in the image, but it would do so individually, for each input. A softmax layer, allows the neural network to run a multi-class function. In short, the neural network will now be able to determine the probability that the dog is in the image, as well as the probability that additional objects are included as well. This example is figured below:

(Image provided by Google Inc.)

Softmax layers are great at determining multi-class probabilities, however there are limits. Softmax can become costly as the number of classes grows. In those situations, candidate sampling can be an effective workaround. With candidate sampling, a softmax layer will limit the scope of its calculations to a specific set of classes. For example, when determining if an image of a bowl of fruit has apples, the probability does not need to be calculated for every type of fruit, just the apples. Additionally, a softmax layer assumes that there is only one member per class, and in situations where an object belongs to multiple classes, a softmax layer will not work. In that case, the alternative is to use multiple logistic regressions instead.