Finite mixtures of distributions (McLachlan and Peel, 2000) are being widely used in various fields for modelling random phenomena. In a finite mixture model, the distribution of random observations is modelled as mixture of a finite number of component distributions with varying proportions. The finite mixture of normal distributions (Fraley and Raftery, 2002) is one of the most frequently used finite mixture models for continuous data taking values in the Euclidean space, because of their flexibility of representation of arbitrary distributions. Indeed, it has been shown that given sufficient number of mixture components, a finite mixture of normals can approximate any continuous probability density functions up to any desired level of accuracy (Bacharoglou, 2010; Nguyen and McLachlan, 2019).
Despite the success and popularity of finite mixture of normal distributions in a wide range of applications, frequently, data possess more structure and representing them using Euclidean vectors may be inappropriate. An important case is when data are normalized to have unit norm, which can be naturally represented as points on the unit hypersphere. For example, the direction of flight of a bird or the orientation of an animal can be represented as points on the circle or sphere . Consequently, standard methods for analyzing univariate or multivariate data cannot be used, and distributions that take into account the directional nature of the data are required.
The von-Mises-Fisher distribution (Fisher et al., 1993) is one of the most commonly used distribution to describe directional data on which has properties analogous to those of the multivariate normal on . A unit norm vector has a von-Mises-Fisher distribution if it has density
where is the concentration parameter and the mean direction satisfies . In particular, as increases, the distribution becomes increasingly concentrated at . The normalizing constant is given by
where is the modified Bessel function at order .
A finite mixture of von-Mises-Fisher distributions on with components has density
The mixing proportions are non-negative and sum to 1 (i.e. ), and are the parameters for the mixture components.
Finite mixtures of von-Mises-Fisher distributions have found numerous applications, including clustering of high dimensional text data and gene expression (Banerjee et al., 2005) and clustering of online user behavior (Qin et al., 2016)
. A natural question that arises is whether finite mixtures of von-Mises-Fisher distributions can approximate any continuous probability distribution on the hypersphere up to any desired level of accuracy.
In this paper, we provide an affirmative answer to this question. We prove that any continuous probability distribution on can be approximated by finite mixture of von-Mises-Fisher distributions in norm given enough mixture components, and each component is sufficiently concentrated at respective mean directions. Our proof utilizes the theory of approximation by spherical convolution (Menegatto, 1997).
The paper is structured as follows. Section 2 provides relevant background that are needed for the proof of the main result. The main result is stated in Section 3 and is proved in Section 4.
This section provides the definitions of kernel function, spherical convolution and eigenfunction expansion which are needed for the proof of the main result. We refer the interested reader toMenegatto (1997) for detailed expositions of the theory.
We denote the space of all continuous functions defined on the hypersphere by . Let be the surface measure on , and define . The uniform and the norm on are defined as
respectively. In particular, the space contains all functions defined on that are integrable with respect to . When no confusion arises, we let be any of the space above with corresponding norm (i.e. for or ).
We define the space which consists of all measurable functions on with norm
Functions in the space are called kernels. Let be the inner product in , it is straight forward to show that for all , the following equality holds:
The spherical convolution of a kernel in with a function in is defined by
For a fixed kernel , the mapping defined by the spherical convolution for has range in .
A useful property of spherical convolution is the Funk and Hecke’s formula (Xu, 2000) for eigenfunction expansion of any kernel . Let be the space of all degree spherical harmonics in variables and let be its dimension (Reimer, 2012, Chapter 3). Let be the Gegenbauer polynomial of degree normalized by . The Gegenbauer polynomials are certain types of the Jacobi polynomials and are conveniently defined using generating functions (Reimer, 2012, Chapter 2).
The Funk and Hecke’s formula states that for a kernel the following expansion holds:
In particular, the spherical harmonics for are the eigenfunctions associated with the kernel
, and the eigenvalues in the series expansion can be expressed in terms of Gegenbauer polynomials:
In particular, we have
Menegatto (1997) has investigated necessary and sufficient conditions under which a sequence of kernels in has the property
as . For non-negative kernels , Theorem 3.4 of Menegatto (1997) provides sufficient conditions for the convergence of spherical convolutions , and is stated below.
Let be a sequence of non-negative kernels in . Suppose
, for all
then as .
3 Main Result
We state the main result concerning the approximating properties of the finite mixtures of von Mises-Fisher distributions in the form (1). Recall that the probability density function of the von Mises–Fisher distribution for the random -dimensional unit vector is given by:
where is the mean direction and is the concentration parameter. We define a sequence of kernels in by
In particular, for any fixed ,
is the density function of the von Mises-Fisher distribution with mean direction and concentration parameter . For a fixed , plays the role of a “bump function” and becomes increasingly concentrated on as increases.
We show that for any continuous probability density functions on , we can construct a mixture of von Mises-Fisher distributions where each mixture component has the form and can be approximated up to desired level of accuracy under the uniform norm.
Let be a continuous probability density function on , then given , there exists integers and , in , in with and such that
4 Proof of Theorem 1
In this section we first state and prove a few lemmas needed for the proof of Theorem 1. Recall that is the space of integrable functions on with respect to either the norm or the uniform norm . We first show that for any function the spherical convolution converges to in norm.
as for all .
It is sufficient to verify conditions 1 and 2 in Lemma 1. For condition 1, since for non-negative kernel and for any fixed ,
The last equality equals to 1 if is a probability density function.
For condition 2, we note that for any fixed ,
where the second equality is a result of applying a change of variable. Since if , the numerator above is bounded above by
To lower bound the denominator, we define the ball where . Consequently,
Since , the RHS of the inequality above goes to 0 as . ∎
The following lemma concerning uniform approximation on by Riemann sums is useful.
Let be a continuous function. Then for any , there is a partition of such that the integral can be uniformly approximated on by Riemann sums:
for any , where each is connected.
For each , there exists a neighborhood such that for , we have
Thus, for any , we have
There exists a partition of by standard spherical coordinates blocks such that can be approximated uniformly by Riemann sums:
for any . Now, for , we have
Since covers , there exists a finite subcover . We can then find a common refinement of all the partitions used in the Riemann sums for , . The claimed result follows immediately. ∎
The following result shows that any continuous function on can be uniformly approximated by linear combinations of for in .
Let be a non-zero continuous function on , then given there exists integers and in in such that
By Lemma 2, there exists an integer such that
On the other hand, by Lemma 3, there exists a partition by connected sets of such that for any and ,
Proof of Theorem 1.
It remains to carefully pick the points to ensure that in (7). This follows by applying the integral mean value theorem to each of the integrals
with connected and the fact that
- Approximation of probability distributions by convex mixtures of Gaussian measures. Proc. Amer. Math. Soc. 138 (7), pp. 2619–2628. External Links: Cited by: §1.
- Clustering on the unit hypersphere using von mises-fisher distributions. J. Mach. Learn. Res. 6, pp. 1345–1382. External Links: Cited by: §1.
- Statistical analysis of spherical data. Cambridge University Press, Cambridge. Note: Revised reprint of the 1987 original External Links: Cited by: §1.
Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc. 97 (458), pp. 611–631. External Links: Cited by: §1.
- Finite mixture models. Probability and Statistics – Applied Probability and Statistics Section, Vol. 299, Wiley, New York. Cited by: §1.
- Approximation by spherical convolution. Numer. Funct. Anal. Optim. 18 (9-10), pp. 995–1012. External Links: Cited by: §1, §2.
- On approximations via convolution-defined mixture models. Comm. Statist. Theory Methods 48 (16), pp. 3945–3955. External Links: Cited by: §1.
- Online trans-dimensional von mises-fisher mixture models for user profiles. J. Mach. Learn. Res. 17 (1), pp. 7021–7071. External Links: Cited by: §1.
- Multivariate polynomial approximation. Vol. 144, Birkhäuser. Cited by: §2.
- Funk-Hecke formula for orthogonal polynomials on spheres and on balls. Bull. London Math. Soc. 32 (4), pp. 447–457. External Links: Cited by: §2.