Contrastive Divergence

What is Contrastive Divergence?

Contrastive divergence is an alternative training technique to approximate the graphical slope representing the relationship between a network’s weights and its error, called the gradient. Since most probabilistic learning algorithms try to optimize the log-likelihood value, this gradient represents the desired direction of change, of learning, for the network’s parameters.

The specific function requires running a Markov chain on a sample of the probabilistic model, starting at the last example processed. This seemingly simple task is the fastest way to measure the log partition function without having to run a complete Monte Carlo sample.

Where and Why is Contrastive Divergence used?

In any situation where you can’t evaluate a function or set of probabilities directly, some form of inference model is needed to approximate the algorithm’s learning gradient and decide which direction to move towards. 

This is most often seen in Restricted Boltzmann Machines (RBM’s), where contrastive divergence is easier to compute randomly (stochastic). This technique is crucial to teach RBM’s how to activate their “hidden” nodes appropriately, then to adjust their trigger parameters based upon input feedback and continually repeat this dimension reduction process.