Perceptual Loss Functions

What is a Perceptual Loss Function?

Perceptual loss functions are used when comparing two different images that look similar, like the same photo but shifted by one pixel. The function is used to compare high level differences, like content and style discrepancies, between images. A perceptual loss function is very similar to the per-pixel loss function, as both are used for training feed-forward neural networks for image transformation tasks. The perceptual loss function is a more commonly used component as it often provides more accurate results regarding style transfer.

How does a Perceptual Loss Function work?

In short, the perceptual loss function works by summing all the squared errors between all the pixels and taking the mean. This is in contrast to a per-pixel loss function which sums all the absolute errors between pixels. Johnson et al. (2016)

argues that perceptual loss functions are not only more accurate in generating high quality images, but also do so as much as three times faster, when optimized. The neural network model is trained on images where the perceptual loss function is optimized based upon high level features extracted from already trained networks. 


The image above represents the neural network that is trained to transform input images into output images. A pre-trained loss network used for image classification helps inform the loss functions. The pre-trained network helps to define the perceptual loss functions needed to measure the perceptual differences of the content and style between the images. 



A commonly seen example of image transformations using neural networks is the popular app Prisma. The app takes an input image and converts/transforms it in such a way that the meaning remains the same, despite differences in style. The method of training the neural network to create "target styles" to be applied upon "target content" involves perceptual loss functions.