What is Contrastive Divergence?
The specific function requires running a Markov chain on a sample of the probabilistic model, starting at the last example processed. This seemingly simple task is the fastest way to measure the log partition function without having to run a complete Monte Carlo sample.
Where and Why is Contrastive Divergence used?
In any situation where you can’t evaluate a function or set of probabilities directly, some form of inference model is needed to approximate the algorithm’s learning gradient and decide which direction to move towards.
This is most often seen in Restricted Boltzmann Machines (RBM’s), where contrastive divergence is easier to compute randomly (stochastic). This technique is crucial to teach RBM’s how to activate their “hidden” nodes appropriately, then to adjust their trigger parameters based upon input feedback and continually repeat this dimension reduction process.