A Note on the Kullback-Leibler Divergence for the von Mises-Fisher distribution

02/25/2015 ∙ by Tom Diethe, et al. ∙ 0

We present a derivation of the Kullback Leibler (KL)-Divergence (also known as Relative Entropy) for the von Mises Fisher (VMF) Distribution in d-dimensions.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The von Mises Fisher (VMF) Distribution (also known as the Langevin Distribution [Watamori96]

) is a probability distribution on the

-dimensional hypersphere in [Fisher53]. If the distribution reduces to the von Mises distribution on the circle, and if it reduces to the Fisher distribution on a sphere. It was introduced by [Fisher53] and has been studied extensively by [Mardia14, Mardia75]. The first Bayesian analysis was in [Mardia76] and recently it has been used for clustering on a hypersphere by [Banerjee05].

Figure 1: Three sets of 1000 points sampled from three VMF distributions on the 3D sphere with (blue), (green) and (red), respectively. The mean directions are indicated with arrows.

2 Preliminaries

2.1 Definitions

We will use to denote the natural logarithm of throughout this article. Before continuing it will be useful to define the Gamma function ,


and its relation, the incomplete Gamma function ,


and the Modified Bessel Function of the First Kind ,


which also has the following integral representations [Abramowitz72],


Also of interest is the logarithm of this quantity (using the second integral definition (6)),


Note that the second term does not depend on .

The Exponential Integral function is given by,


An identity that will be useful is,


2.2 The von Mises Fisher (Vmf) distribution

The probability density function (PDF) of the VMF

distribution for a random d-dimensional unit vector

is given by:


where the normalisation constant is given by,


The (non-symmetric) Kullback Leibler (KL)-Divergence from one probability distributions to another probability distribution is defined as,


Although this is general to any two distributions, we will assume that is the “prior” distribution and is the “posterior” distribution as commonly used in Bayesian analysis.

3 Kl-Divergence for the Vmf Distribution

3.1 General Case

We will assume that we have prior and posterior distributions defined over vectors as follows,


We will now derive the KL-Divergence for two VMF distributions. The main problem in doing so will be the the normalisation constants and . For prior and posterior distributions as defined above over vectors odd111For even we can simply add a “null” dimension, we have

From (12), letting , , and , we have,