# A Note on the Kullback-Leibler Divergence for the von Mises-Fisher distribution

We present a derivation of the Kullback Leibler (KL)-Divergence (also known as Relative Entropy) for the von Mises Fisher (VMF) Distribution in d-dimensions.

There are no comments yet.

## Authors

• 11 publications
• ### Fisher Auto-Encoders

It has been conjectured that the Fisher divergence is more robust to mod...
07/12/2020 ∙ by Khalil Elkhalil, et al. ∙ 43

• ### Schrödinger encounters Fisher and Rao: a survey

In this short note we review the dynamical Schrödinger problem on the no...
04/01/2021 ∙ by Léonard Monsaingeon, et al. ∙ 0

• ### An Extended Cencov-Campbell Characterization of Conditional Information Geometry

We formulate and prove an axiomatic characterization of conditional info...
07/11/2012 ∙ by Guy Lebanon, et al. ∙ 0

• ### Fisher zeros and correlation decay in the Ising model

In this note, we show that the zero field Ising partition function has n...
07/17/2018 ∙ by Jingcheng Liu, et al. ∙ 0

• ### On Voronoi diagrams and dual Delaunay complexes on the information-geometric Cauchy manifolds

We study the Voronoi diagrams of a finite set of Cauchy distributions an...
06/12/2020 ∙ by Frank Nielsen, et al. ∙ 0

• ### The Power Spherical distribution

There is a growing interest in probabilistic models defined in hyper-sph...
06/08/2020 ∙ by Nicola De Cao, et al. ∙ 18

• ### Reference Bayesian analysis for hierarchical models

This paper proposes an alternative approach for constructing invariant J...
04/25/2019 ∙ by Thaís C. O. Fonseca, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The von Mises Fisher (VMF) Distribution (also known as the Langevin Distribution [Watamori96]

) is a probability distribution on the

-dimensional hypersphere in [Fisher53]. If the distribution reduces to the von Mises distribution on the circle, and if it reduces to the Fisher distribution on a sphere. It was introduced by [Fisher53] and has been studied extensively by [Mardia14, Mardia75]. The first Bayesian analysis was in [Mardia76] and recently it has been used for clustering on a hypersphere by [Banerjee05].

## 2 Preliminaries

### 2.1 Definitions

We will use to denote the natural logarithm of throughout this article. Before continuing it will be useful to define the Gamma function ,

 Γ(z) =∫∞0tz−1e−tdt, z∈\Complex,Re(z)>0 (1) Γ(z) =(z−1)!, z∈\Integers+ (2)

and its relation, the incomplete Gamma function ,

 Γ(z,s) =(s−1)!e−xs−1∑m=0zmm!,z∈\Integers+ (3)

and the Modified Bessel Function of the First Kind ,

 Iα(z)=∞∑m=0(z/2)2m+αm!Γ(m+α+1), (4)

which also has the following integral representations [Abramowitz72],

 Iα(z) =(z/2)α√πΓ(α+1/2)∫π0e±zcosθsin2dθ dθ, \LPα∈\Reals\RP (5) =(z/2)α√πΓ(α+1/2)∫1−1(1−t2)(α−1/2)e±zt dt. \LPα∈\Reals,α>−0.5\RP (6)

Also of interest is the logarithm of this quantity (using the second integral definition (6)),

 log\LPIα(z)\RP =log\LB(z2)α√πΓ(α+1/2)∫1−1(1−t2)(α−1/2)e±zt dt\RB =log(z2)α√πΓ(α+1/2)+log\LB∫1−1(1−t2)(α−1/2)e±zt dt\RB =log\LPz2\RPα−log√πΓ(α+1/2)+log\LB∫1−1(1−t2)(α−1/2)e±zt dt\RB. (7)

Note that the second term does not depend on .

The Exponential Integral function is given by,

 Eα(z) =∫∞1e−zttαdt, =zα−1Γ(1−α,z). (8)

An identity that will be useful is,

 ∫1−1(1−t)detκ=−2d−1E−d(2κ)eκ.d>0 (9)

### 2.2 The von Mises Fisher (Vmf) distribution

The probability density function (PDF) of the VMF

distribution for a random d-dimensional unit vector

is given by:

 Md(\mub,κ)=cd(κ)eκ\mub′\x,\x∈Sd−1, (10)

where the normalisation constant is given by,

 cd(κ)=κd/2−1(2π)d/2Id/2−1(κ). (11)

The (non-symmetric) Kullback Leibler (KL)-Divergence from one probability distributions to another probability distribution is defined as,

 \KL(q(\x)||p(\x)) =∫\xq(\x)logq(\x)p(\x) d\x, (12) =\EEx\LBlogq(\x)p(\x)\RB. (13)

Although this is general to any two distributions, we will assume that is the “prior” distribution and is the “posterior” distribution as commonly used in Bayesian analysis.

## 3 Kl-Divergence for the Vmf Distribution

### 3.1 General Case

We will assume that we have prior and posterior distributions defined over vectors as follows,

 p(\x)∼Md(\mubp,κp), q(\x)∼Md(\mubq,κq). (14)

We will now derive the KL-Divergence for two VMF distributions. The main problem in doing so will be the the normalisation constants and . For prior and posterior distributions as defined above over vectors odd111For even we can simply add a “null” dimension, we have

 \KL(q(\x)||p(\x)) ≤κq−κp\mub′p\mubq+\dbulletlog(κq)+\ddiamond∑m=1κmqm! −\LPd2−2d+14\RPlog(κp)+\ddiamond(\ddiamond+1)log\ddiamond−\ddiamond2+1

From (12), letting , , and , we have,