# Universal Approximation on the Hypersphere

It is well known that any continuous probability density function on R^m can be approximated arbitrarily well by a finite mixture of normal distributions, provided that the number of mixture components is sufficiently large. The von-Mises-Fisher distribution, defined on the unit hypersphere S^m in R^m+1, has properties that are analogous to those of the multivariate normal on R^m+1. We prove that any continuous probability density function on S^m can be approximated to arbitrary degrees of accuracy by a finite mixture of von-Mises-Fisher distributions.

## Authors

• 10 publications
• 1 publication
03/01/2019

### Approximation by finite mixtures of continuous density functions that vanish at infinity

Given sufficiently many components, it is often cited that finite mixtur...
10/20/2021

### Hyperspherical Dirac Mixture Reapproximation

We propose a novel scheme for efficient Dirac mixture modeling of distri...
10/02/2018

### Inverse Gaussian quadrature and finite normal-mixture approximation of generalized hyperbolic distribution

In this study, a numerical quadrature for the generalized inverse Gaussi...
02/10/2021

### On PyTorch Implementation of Density Estimators for von Mises-Fisher and Its Mixture

The von Mises-Fisher (vMF) is a well-known density model for directional...
07/13/2021

### IID Sampling from Intractable Distributions

We propose a novel methodology for drawing iid realizations from any tar...
02/27/2015

### Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

Mixture modelling involves explaining some observed evidence using a com...
09/22/2019

### Probabilistic Fitting of Topological Structure to Data

We define a class of probability distributions that we call simplicial m...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Finite mixtures of distributions (McLachlan and Peel, 2000) are being widely used in various fields for modelling random phenomena. In a finite mixture model, the distribution of random observations is modelled as mixture of a finite number of component distributions with varying proportions. The finite mixture of normal distributions (Fraley and Raftery, 2002) is one of the most frequently used finite mixture models for continuous data taking values in the Euclidean space, because of their flexibility of representation of arbitrary distributions. Indeed, it has been shown that given sufficient number of mixture components, a finite mixture of normals can approximate any continuous probability density functions up to any desired level of accuracy (Bacharoglou, 2010; Nguyen and McLachlan, 2019).

Despite the success and popularity of finite mixture of normal distributions in a wide range of applications, frequently, data possess more structure and representing them using Euclidean vectors may be inappropriate. An important case is when data are normalized to have unit norm, which can be naturally represented as points on the unit hypersphere

. For example, the direction of flight of a bird or the orientation of an animal can be represented as points on the circle or sphere . Consequently, standard methods for analyzing univariate or multivariate data cannot be used, and distributions that take into account the directional nature of the data are required.

The von-Mises-Fisher distribution (Fisher et al., 1993) is one of the most commonly used distribution to describe directional data on which has properties analogous to those of the multivariate normal on . A unit norm vector has a von-Mises-Fisher distribution if it has density

 fm+1(x;μ,κ)=cm+1(κ)exp(κ⟨x,μ⟩),x∈Sm,

where is the concentration parameter and the mean direction satisfies . In particular, as increases, the distribution becomes increasingly concentrated at . The normalizing constant is given by

 cm+1(κ)=κm+12−1(2π)m+12Im+12−1(κ),

where is the modified Bessel function at order .

A finite mixture of von-Mises-Fisher distributions on with components has density

 fm+1(x;{πh,μh,κh}Hh=1)=H∑h=1πhfm+1(x;μh,κh). (1)

The mixing proportions are non-negative and sum to 1 (i.e. ), and are the parameters for the mixture components.

Finite mixtures of von-Mises-Fisher distributions have found numerous applications, including clustering of high dimensional text data and gene expression (Banerjee et al., 2005) and clustering of online user behavior (Qin et al., 2016)

. A natural question that arises is whether finite mixtures of von-Mises-Fisher distributions can approximate any continuous probability distribution on the hypersphere up to any desired level of accuracy.

In this paper, we provide an affirmative answer to this question. We prove that any continuous probability distribution on can be approximated by finite mixture of von-Mises-Fisher distributions in norm given enough mixture components, and each component is sufficiently concentrated at respective mean directions. Our proof utilizes the theory of approximation by spherical convolution (Menegatto, 1997).

The paper is structured as follows. Section 2 provides relevant background that are needed for the proof of the main result. The main result is stated in Section 3 and is proved in Section 4.

## 2 Background

This section provides the definitions of kernel function, spherical convolution and eigenfunction expansion which are needed for the proof of the main result. We refer the interested reader to

Menegatto (1997) for detailed expositions of the theory.

We denote the space of all continuous functions defined on the hypersphere by . Let be the surface measure on , and define . The uniform and the norm on are defined as

 ||f||m,∞:=supx∈Sm|f(x)|

and

 ||f||m,p:=(1ωm∫Sm|f(x)|pdωm(x))1/p,

respectively. In particular, the space contains all functions defined on that are integrable with respect to . When no confusion arises, we let be any of the space above with corresponding norm (i.e. for or ).

We define the space which consists of all measurable functions on with norm

 ||K||1,m:=ωm−1ωm∫1−1|K(t)|(1−t2)(m−2)/2dt<∞.

Functions in the space are called kernels. Let be the inner product in , it is straight forward to show that for all , the following equality holds:

 ||K||1,m:=1ωm∫Sm|K(⟨x,y⟩)|dωm(y).

The spherical convolution of a kernel in with a function in is defined by

 (K∗f)(x):=1ωm∫SmK(⟨x,y⟩)f(y)dωm(y),x∈Sm.

For a fixed kernel , the mapping defined by the spherical convolution for has range in .

A useful property of spherical convolution is the Funk and Hecke’s formula (Xu, 2000) for eigenfunction expansion of any kernel . Let be the space of all degree spherical harmonics in variables and let be its dimension (Reimer, 2012, Chapter 3). Let be the Gegenbauer polynomial of degree normalized by . The Gegenbauer polynomials are certain types of the Jacobi polynomials and are conveniently defined using generating functions (Reimer, 2012, Chapter 2).

The Funk and Hecke’s formula states that for a kernel the following expansion holds:

 K∗Ymk=amk(K)Ymk,K∈L1,m,Ymk∈Hmk,k=0,1,…

In particular, the spherical harmonics for are the eigenfunctions associated with the kernel

, and the eigenvalues in the series expansion can be expressed in terms of Gegenbauer polynomials:

 amk(K)=ωm−1ωm∫1−1K(t)Q(m−1)/2k(t)Q(m−1)/2k(1)(1−t2)((m−2)/2dt,k=0,1,…

In particular, we have

 am0(K)=ωm−1ωm∫1−1K(t)(1−t2)((m−2)/2dt.

Menegatto (1997) has investigated necessary and sufficient conditions under which a sequence of kernels in has the property

 ||Kn∗f−f||m→0,∀f∈Vm

as . For non-negative kernels , Theorem 3.4 of Menegatto (1997) provides sufficient conditions for the convergence of spherical convolutions , and is stated below.

###### Lemma 1.

Let be a sequence of non-negative kernels in . Suppose

1. as ;

2. , for all

then as .

## 3 Main Result

We state the main result concerning the approximating properties of the finite mixtures of von Mises-Fisher distributions in the form (1). Recall that the probability density function of the von Mises–Fisher distribution for the random -dimensional unit vector is given by:

 fm+1(x;μ,κ)=cm+1(κ)exp(κ⟨x,μ⟩),

where is the mean direction and is the concentration parameter. We define a sequence of kernels in by

 Kn(t)=cm+1(n)exp(nt),t∈[−1,1]. (2)

In particular, for any fixed ,

 Kn(⟨x,y⟩)=cm+1(n)exp(κ⟨x,y⟩),x∈Sm

is the density function of the von Mises-Fisher distribution with mean direction and concentration parameter . For a fixed , plays the role of a “bump function” and becomes increasingly concentrated on as increases.

We show that for any continuous probability density functions on , we can construct a mixture of von Mises-Fisher distributions where each mixture component has the form and can be approximated up to desired level of accuracy under the uniform norm.

###### Theorem 1.

Let be a continuous probability density function on , then given , there exists integers and , in , in with and such that

 maxx∈Sm∣∣∣f(x)−N∑k=1ckKn(⟨x,yk⟩)∣∣∣<δ.

## 4 Proof of Theorem 1

In this section we first state and prove a few lemmas needed for the proof of Theorem 1. Recall that is the space of integrable functions on with respect to either the norm or the uniform norm . We first show that for any function the spherical convolution converges to in norm.

as for all .

###### Proof.

It is sufficient to verify conditions 1 and 2 in Lemma 1. For condition 1, since for non-negative kernel and for any fixed ,

 am0(K) = ωm−1ωm∫1−1K(t)(1−t2)(m−2)/2dt = 1ωm∫SmK(⟨x,y⟩)dωm(y)

The last equality equals to 1 if is a probability density function.

For condition 2, we note that for any fixed ,

 ∫ρ−1|Kn(t)|(1−t2)(m−2)/2dt = ∫ρ−1ent(1−t2)(m−2)/2dt∫1−1ent(1−t2)(m−2)/2dt = ∫{y:⟨x,y⟩≤ρ}en⟨x,y⟩dωm(y)∫Smen⟨x,y⟩dωm(y)

where the second equality is a result of applying a change of variable. Since if , the numerator above is bounded above by

 ∫{y:⟨x,y⟩≤ρ}en⟨x,y⟩dωm(y)≤ωm({y:⟨x,y⟩≤ρ})enρ. (3)

To lower bound the denominator, we define the ball where . Consequently,

 ∫Smen⟨x,y⟩dωm(y)≥∫Bδ(x)en⟨x,y⟩dωm(y) (4) ≥en(1−δ)ωm(Bδ(x)). (5)

Therefore, combining the two inequalities (3) and (4), we have

 ∫ρ−1|Kn(t)|(1−t2)(m−2)/2dt≤ωm({y:⟨x,y⟩≤ρ})enρen(1−δ)ωm(Bδ(x)).

Since , the RHS of the inequality above goes to 0 as . ∎

The following lemma concerning uniform approximation on by Riemann sums is useful.

###### Lemma 3.

Let be a continuous function. Then for any , there is a partition of such that the integral can be uniformly approximated on by Riemann sums:

 maxx∈Sm∣∣ ∣∣∫Smg(x,y)dωm(y)−N∑k=1g(x,yk)ωm(Uk)∣∣ ∣∣<δ,

for any , where each is connected.

###### Proof.

For each , there exists a neighborhood such that for , we have

 maxy∈Sm|g(x,y)−g(x′,y)|<δ3ωm.

Thus, for any , we have

 ∣∣∣∫Smg(x,y)dωm(y)−∫Smg(x′,y)dωm(y)∣∣∣≤ ∫Sm∣∣g(x,y)−g(x′,y)∣∣dωm(y) ≤ maxy∈Sm∣∣g(x,y)−g(x′,y)∣∣∫Smdωm(y) < δ3.

There exists a partition of by standard spherical coordinates blocks such that can be approximated uniformly by Riemann sums:

 ∣∣ ∣∣∫Smg(x′,y)dωm(y)−N′∑k=1g(x′,yk)ωm(Uk)∣∣ ∣∣<δ3

for any . Now, for , we have

 ∣∣ ∣∣∫Smg(x,y)dωm(y)−N′∑k=1g(x,yk)ωm(Uk)∣∣ ∣∣≤ ∣∣∣∫Smg(x,y)dωm(y)−∫Smg(x′,y)dωm(y)∣∣∣ +∣∣ ∣∣∫Smg(x′,y)dωm(y)−N′∑k=1g(x′,yk)ωm(Uk)∣∣ ∣∣ +∣∣ ∣∣N′∑k=1g(x′,yk)ωm(Uk)−N′∑k=1g(x,yk)ωm(Uk)∣∣ ∣∣ < δ.

Since covers , there exists a finite subcover . We can then find a common refinement of all the partitions used in the Riemann sums for , . The claimed result follows immediately. ∎

The following result shows that any continuous function on can be uniformly approximated by linear combinations of for in .

###### Lemma 4.

Let be a non-zero continuous function on , then given there exists integers and in in such that

###### Proof.

By Lemma 2, there exists an integer such that

 maxx∈Sm∣∣∣f(x)−∫SmKn(⟨x,y⟩)f(y)dωm(y)∣∣∣<δ2. (6)

On the other hand, by Lemma 3, there exists a partition by connected sets of such that for any and ,

 ∣∣ ∣∣∫SmK(⟨x,y⟩)f(y)dωm(y)−N∑k=1Kn(⟨x,yk⟩)f(yk)ωm(Uk)∣∣ ∣∣<δ2. (7)

The result follows by combining (6), (7), and letting for .

###### Proof of Theorem 1.

It remains to carefully pick the points to ensure that in (7). This follows by applying the integral mean value theorem to each of the integrals

 ∫Ukf(y)dωm(y),k=1,…,N (8)

with connected and the fact that

 N∑k=1∫Ukf(y)dωm(y)=1.

## References

• A. G. Bacharoglou (2010) Approximation of probability distributions by convex mixtures of Gaussian measures. Proc. Amer. Math. Soc. 138 (7), pp. 2619–2628. External Links: ISSN 0002-9939, Document, Link, MathReview (John E. Kolassa) Cited by: §1.
• A. Banerjee, I. S. Dhillon, J. Ghosh, and S. Sra (2005) Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6, pp. 1345–1382. External Links: ISSN 1532-4435 Cited by: §1.
• N. I. Fisher, T. Lewis, and B. J. J. Embleton (1993) Statistical analysis of spherical data. Cambridge University Press, Cambridge. Note: Revised reprint of the 1987 original External Links: ISBN 0-521-45699-1, MathReview Entry Cited by: §1.
• C. Fraley and A. E. Raftery (2002)

Model-based clustering, discriminant analysis, and density estimation

.
J. Amer. Statist. Assoc. 97 (458), pp. 611–631. External Links: ISSN 0162-1459, Document, Link, MathReview Entry Cited by: §1.
• G. J. McLachlan and D. Peel (2000) Finite mixture models. Probability and Statistics – Applied Probability and Statistics Section, Vol. 299, Wiley, New York. Cited by: §1.
• V. A. Menegatto (1997) Approximation by spherical convolution. Numer. Funct. Anal. Optim. 18 (9-10), pp. 995–1012. External Links: ISSN 0163-0563, Document, Link, MathReview (Sarjoo Prasad Yadav) Cited by: §1, §2.
• H. D. Nguyen and G. McLachlan (2019) On approximations via convolution-defined mixture models. Comm. Statist. Theory Methods 48 (16), pp. 3945–3955. External Links: ISSN 0361-0926, Document, Link, MathReview Entry Cited by: §1.
• X. Qin, P. Cunningham, and M. Salter-Townshend (2016) Online trans-dimensional von mises-fisher mixture models for user profiles. J. Mach. Learn. Res. 17 (1), pp. 7021–7071. External Links: ISSN 1532-4435 Cited by: §1.
• M. Reimer (2012) Multivariate polynomial approximation. Vol. 144, Birkhäuser. Cited by: §2.
• Y. Xu (2000) Funk-Hecke formula for orthogonal polynomials on spheres and on balls. Bull. London Math. Soc. 32 (4), pp. 447–457. External Links: ISSN 0024-6093, Document, Link, MathReview (R. N. Kalia) Cited by: §2.