## 1 Introduction

The von Mises Fisher (VMF) Distribution (also known as the Langevin Distribution [Watamori96]

) is a probability distribution on the

-dimensional hypersphere in [Fisher53]. If the distribution reduces to the von Mises distribution on the circle, and if it reduces to the Fisher distribution on a sphere. It was introduced by [Fisher53] and has been studied extensively by [Mardia14, Mardia75]. The first Bayesian analysis was in [Mardia76] and recently it has been used for clustering on a hypersphere by [Banerjee05].## 2 Preliminaries

### 2.1 Definitions

We will use to denote the natural logarithm of throughout this article. Before continuing it will be useful to define the Gamma function ,

(1) | |||||

(2) |

and its relation, the incomplete Gamma function ,

(3) |

and the Modified Bessel Function of the First Kind ,

(4) |

which also has the following integral representations [Abramowitz72],

(5) | |||||

(6) |

Also of interest is the logarithm of this quantity (using the second integral definition (6)),

(7) |

Note that the second term does not depend on .

The Exponential Integral function is given by,

(8) |

An identity that will be useful is,

(9) |

### 2.2 The von Mises Fisher (Vmf) distribution

The probability density function (PDF) of the VMF

distribution for a random d-dimensional unit vector

is given by:(10) |

where the normalisation constant is given by,

(11) |

The (non-symmetric) Kullback Leibler (KL)-Divergence from one probability distributions to another probability distribution is defined as,

(12) | ||||

(13) |

Although this is general to any two distributions, we will assume that is the “prior” distribution and is the “posterior” distribution as commonly used in Bayesian analysis.

## 3 Kl-Divergence for the Vmf Distribution

### 3.1 General Case

We will assume that we have prior and posterior distributions defined over vectors as follows,

(14) |

We will now derive the KL-Divergence for two VMF distributions. The main problem in doing so will be the the normalisation constants and .
For prior and posterior distributions as defined above over vectors odd^{1}^{1}1For even we can simply add a “null” dimension, we have

From (12), letting , , and , we have,

Comments

There are no comments yet.