    # A Note on the Kullback-Leibler Divergence for the von Mises-Fisher distribution

We present a derivation of the Kullback Leibler (KL)-Divergence (also known as Relative Entropy) for the von Mises Fisher (VMF) Distribution in d-dimensions.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The von Mises Fisher (VMF) Distribution (also known as the Langevin Distribution [Watamori96]

) is a probability distribution on the

-dimensional hypersphere in [Fisher53]. If the distribution reduces to the von Mises distribution on the circle, and if it reduces to the Fisher distribution on a sphere. It was introduced by [Fisher53] and has been studied extensively by [Mardia14, Mardia75]. The first Bayesian analysis was in [Mardia76] and recently it has been used for clustering on a hypersphere by [Banerjee05]. Figure 1: Three sets of 1000 points sampled from three VMF distributions on the 3D sphere with κ=1 (blue), κ=10 (green) and κ=100 (red), respectively. The mean directions are indicated with arrows.

## 2 Preliminaries

### 2.1 Definitions

We will use to denote the natural logarithm of throughout this article. Before continuing it will be useful to define the Gamma function ,

 Γ(z) =∫∞0tz−1e−tdt, z∈\Complex,Re(z)>0 (1) Γ(z) =(z−1)!, z∈\Integers+ (2)

and its relation, the incomplete Gamma function ,

 Γ(z,s) =(s−1)!e−xs−1∑m=0zmm!,z∈\Integers+ (3)

and the Modified Bessel Function of the First Kind ,

 Iα(z)=∞∑m=0(z/2)2m+αm!Γ(m+α+1), (4)

which also has the following integral representations [Abramowitz72],

 Iα(z) =(z/2)α√πΓ(α+1/2)∫π0e±zcosθsin2dθ dθ, \LPα∈\Reals\RP (5) =(z/2)α√πΓ(α+1/2)∫1−1(1−t2)(α−1/2)e±zt dt. \LPα∈\Reals,α>−0.5\RP (6)

Also of interest is the logarithm of this quantity (using the second integral definition (6)),

 log\LPIα(z)\RP =log\LB(z2)α√πΓ(α+1/2)∫1−1(1−t2)(α−1/2)e±zt dt\RB =log(z2)α√πΓ(α+1/2)+log\LB∫1−1(1−t2)(α−1/2)e±zt dt\RB =log\LPz2\RPα−log√πΓ(α+1/2)+log\LB∫1−1(1−t2)(α−1/2)e±zt dt\RB. (7)

Note that the second term does not depend on .

The Exponential Integral function is given by,

 Eα(z) =∫∞1e−zttαdt, =zα−1Γ(1−α,z). (8)

An identity that will be useful is,

 ∫1−1(1−t)detκ=−2d−1E−d(2κ)eκ.d>0 (9)

### 2.2 The von Mises Fisher (Vmf) distribution

The probability density function (PDF) of the VMF

distribution for a random d-dimensional unit vector

is given by:

 Md(\mub,κ)=cd(κ)eκ\mub′\x,\x∈Sd−1, (10)

where the normalisation constant is given by,

 cd(κ)=κd/2−1(2π)d/2Id/2−1(κ). (11)

The (non-symmetric) Kullback Leibler (KL)-Divergence from one probability distributions to another probability distribution is defined as,

 \KL(q(\x)||p(\x)) =∫\xq(\x)logq(\x)p(\x) d\x, (12) =\EEx\LBlogq(\x)p(\x)\RB. (13)

Although this is general to any two distributions, we will assume that is the “prior” distribution and is the “posterior” distribution as commonly used in Bayesian analysis.

## 3 Kl-Divergence for the Vmf Distribution

### 3.1 General Case

We will assume that we have prior and posterior distributions defined over vectors as follows,

 p(\x)∼Md(\mubp,κp), q(\x)∼Md(\mubq,κq). (14)

We will now derive the KL-Divergence for two VMF distributions. The main problem in doing so will be the the normalisation constants and . For prior and posterior distributions as defined above over vectors odd111For even we can simply add a “null” dimension, we have

 \KL(q(\x)||p(\x)) ≤κq−κp\mub′p\mubq+\dbulletlog(κq)+\ddiamond∑m=1κmqm! −\LPd2−2d+14\RPlog(κp)+\ddiamond(\ddiamond+1)log\ddiamond−\ddiamond2+1

From (12), letting , , and , we have,