Penalization of barycenters for φ-exponential distributions

06/15/2020 ∙ by S. Kum, et al. ∙ University of Birmingham 0

In this paper we study the penalization of barycenters in the Wasserstein space for φ-exponential distributions. We obtain an explicit characterization of the barycenter in terms of the variances of the measures generalizing existing results for Gaussian measures. We then develop a gradient projection method for the computation of the barycenter establishing a Lipstchitz continuity for the gradient function. We also numerically show the influence of parameters and stability of the algorithm under small perturbation of data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

1.1. Penalization of barycenters in the Wasserstein space

In this paper we are interested in the penalization of barycenters in the Wasserstein space, which is a minimization problem of the form

(1)

where is a subset of

, which is the Wasserstein space of probability measures on

with finite second moments;

are given probability measures in ; is the -Wasserstein distance between two probability measures in (cf. Section 2), and is an entropy functional. Finally is a given regularization/penalization parameter; are given non-negative numbers (weights) satisfying .

1.2. Literature review

Problem (1) for has been studied intensively in the literature. It was first studied by Knott and Smith [20] for Gaussian measures. In [1], Agueh and Carlier studied the general case proving, among other things, the existence and uniqueness of a minimizer provided that one of vanishes on small sets (i.e., sets whose Hausdorff dimension is at most ). The minimizer is called the barycenter of the measures with weights extending a classical characterization of the Euclidean barycenter. The article [1] has sparked off many research activities from both theoretical and computational aspects over the last years. Wasserstein barycenters in different settings, such as over Riemannian manifolds and over discrete data, have been investigated [19, 4]. Connections between Wasserstein barycenters and optimal transports have been explored [27, 18]. Several computational methods for the computation of the barycenter have been developed [12, 2, 21, 28]

. Recently Wasserstein barycenters has found many applications in statistics, image processing and machine learning

[29, 23, 31]. We refer the reader to the mentioned papers and references therein for a more detailed account of the topic.

The case has been studied in the recent paper [8] where the existence, uniqueness and stability of a minimizer, which is called the penalized barycenter, has been established. The regularization parameter was proved to provide smooth barycenters especially when the input probability measures are irregular which is useful for data analysis [7, 30]. In addition, the penalized barycenter problem also resembles the discretization formulation of Wasserstein gradient flows for dissipative evolution equations [17, 3, 10] and the fractional heat equation [14] at a given time step where represent discretized solutions at the previous steps and is proportional to the time-step parameter.

Gaussian measures play an important role in the study of Wasserstein barycenter problem since in this case an useful characterization of the barycenter exists [1, 6] which gives rise to efficient computational algorithms such as the fixed point approach [2] and the gradient projection method [21]. Our aim in this paper is to seek for a large class of probability measures so that the penalized barycenter can be explicitly characterized and computed similarly to the case of Gaussian measures. We will study the penalization problem (1) for an important classes of probability measures, namely -exponential measures, where the entropy functional is the Tsallis entropy functional respectively. The class of -exponential measures significantly enlarges that of Gaussian measures and containing also -Gaussian measures as special cases, cf. Section 1.3 below. To state our main results, we briefly recall the definition of -exponential measures; more detailed will be given in Section 2.

1.3. -exponential distributions

Let be an increasing, positive, continuous function on , the -logarithmic is defined by [33]

(2)

which is increasing, concave and on . Let and be respectively the infimum and the supremum of , that is

(3)

The function has the inverse function, which is called the -exponential function, and is defined on . This inverse function can be extended to the whole as

(4)

which is on .

Let () be the set of symmetric (positive definite, respectively) matrices of order . Let

be a given vector and

be a given symmetric positive definite matrix. The -exponential measure with mean and covariance matrix is the probability measure on with Lebesgue density

(5)

where , and are continuous functions on playing the role of normalization constants. Two important examples of -exponential measures include Gaussian measures and -Gaussian measures corresponding to and respectively. The -exponential measures play an important role in statistical physics, information geometry and in the analysis of nonlinear diffusion equations [26, 25, 32, 33]. More information about -exponential measures will be reviewed in Section 2.

1.4. Main results of the paper

As already mentioned, in this paper we study the penalization problem (1) for Gaussian measures and -exponential measures, where the entropy functional is the (negative) Boltzmann entropy functional and the Tsallis entropy functional respectively. Main results of the present paper are explicit characterizations of the minimizer of (1) and properties of the objective functions that can be summarized as follows.

Theorem 1.1.

Suppose are either Gaussian measures or -Gaussian measures or -exponential measures (in this case, ) with mean zero. Then the minimization problem (1) has a unique minimizer whose variance solves the nonlinear matrix equation (20) or (30) or (48) respectively. Furthermore, the objective function is strictly convex.

Theorem 1.2.

The gradient function of the objective function is Lipschitz continuous.

Theorem 1.1 summarizes Theorem 3.1 (for Gaussian measures), Theorem 4.1 (for -Gaussian measures) and Theorem 5.1 (for general -exponential measures). Theorem 1.2 summarizes Theorem 6.2 (for Gaussian measures) and Theorem 6.3 (for -Gaussian measures).

The key to the analysis of the present paper is that the spaces of -exponential measures and Gaussian measures are isometric in the sense of Wasserstein geometry [32, 33], that is

where denotes a Gaussian measure with mean and covariance matrix . Therefore, since the Wassertein distance between Gaussian measures can be computed explicitly, the objective functional in (1) can also be computed explicitly in terms of the variances and (1) becomes a minimization problem over the space of symmetric positive definite matrices. We then prove the strict convexity of the objective function and the existence of solutions to the optimality equation using matrix analysis tools as in [6]. Theorems 3.1, 4.1 and 5.1 establish the existence and uniqueness of a minimizer and provide an explicit characterization of the minimizer in terms of nonlinear matrix equations for the variance generalizing the characterization of the Wasserstein barycenter for Gaussian measures in [1, 6] to the penalized Wasserstein barycenter for Gaussian measures and -exponential measures. Theorem 6.2 and Theorem 6.3 prove the Lipschitz continuity of the gradient of the objective function providing an explicit upper bound for the Lipschitz constant generalizing the results of [21] for the barycenter for Gaussian measures to our setting. We also perform numerical experiments to show the affect of the parameter and a stability property of the algorithm under small perturbation of the data, cf. Section 7.

1.5. Organization of the paper

The rest of the paper is organized as follows. In Section 2 we review relevant knowledge that will be used in subsequent sections on the Wasserstein metric and the Wasserstein geometry of Gaussian and -exponential distributions. Then we study the penalization of barycenters for Gaussian measures in Section 3 and extend these results to -Gaussian and -exponential measures in Section 4 and Section 5. In Section 6 we describe a gradient projection method for the computation of the minimizer and prove that the gradient function is Lipschitz continuous. Finally, in Section 7, we numerically show affect of parameters to the minimizer and stability of the algorithm under small perturbation of data.

2. Wasserstein metric, Gaussian measures and -exponential measures

In this section, we summarize relevant knowledge that will be used in subsequent sections on the Wasserstein metric and the Wasserstein geometry of Gaussian and -exponential distributions.

2.1. Wasserstein metric

We recall that is the space of probability measures in with finite second moments, that is

(6)

Let and be two probability measures belonging to . The -Wasserstein distance, , between and is defined via

(7)

where denotes the set of transport plans between and , i.e., the set of all probability measures on having and as the first and the second marginals respectively. More precisely,

for all Borel measurable sets . It has been proved that, under rather general conditions (e.g., when and are absolutely continuous with respect to the Lesbegue measure), an optimal transport plan in (7) uniquely exists and is of the form for some convex function where denotes the push forward [9, 15].

The Wasserstein distance is an instance of a Monge-Kantorevich optimal transportation cost functional and plays a key role in many branches of mathematics such as optimal transportation, partial differential equations, geometric analysis and has been found many applications in other fields such as economics, statistical physics and recently in machine learning. We refer the reader to the celebrated monograph

[34] for a great exposition of the topic.

We now consider two important classes of probability measures, namely Gaussian measures and -exponential measures, for which there is an explicit expression for the Wasserstein distance between two members of the same class. Although Gaussian measures are special cases of -exponential measures, but we consider them separately since many proofs for the former are much simplified than those for the latter.

2.2. Wasserstein distance of Gaussian measures

The Wasserstein distance between two Gaussian measures is well-known [16], see also e.g., [32]:

(8)

Furthermore, is the optimal plan between them, where

(9)

2.3. The entropy of Gaussian measures

The (negative) Boltzmann entropy functional of a probability measure is defined by

(10)

Using Gaussian integral, the (negative) Boltzmann entropy of a Gaussian measure can be computed explicitly [11, Theorem 9.4.1]:

(11)

We now consider the second class of probability measures: -exponential measures.

2.4. -exponential measures and Wassertein distance

We recall that for a given increasing, positive and continuous function on , the -logarithmic function and the -exponential function are respectively defined in (2) and (4). Two important classes of -exponential functions are:

  1. : the -logarithmic function and the -exponential function become the traditional logarithmic and exponential functions: .

  2. for some : the -logarithmic function and the -exponential function become the -logarithmic and -exponential functions respectively

    (12)

    where and by convention . The -logarithmic function satisfies the following property

    (13)
Definition 2.1.

For any , we define to be the set of all increasing, continuous function on such that where

It is proved in [33, Proposition 3.2] that for any there exist continuous functions and on such that (cf. (5) in the Introduction)

(14)

where , is a probability density on with mean and covariance matrix , which is called a -exponential distribution. Note that, in the above expression, and

are enough to define only at the identity matrix

, not on all . We define the space of all -exponential distribution measures by

(15)

Above is the Lesbesgue measure on . Two important cases:

  1. , reduces to the class of Gaussian measures with mean and covariance matrix .

  2. In the case , becomes the class of all -Gaussian measures

    where

    and are given by

    Note that when , . Thus Gaussian measures are special cases of -Gaussian measures.

The -exponential measures play an important role in statistical physics, information geometry and in the analysis of nonlinear diffusion equations [26, 25, 32, 33]. We refer to [25, 32, 13] for further details on -Gaussian measures, -exponential measures and and their properties.

The following result explains why -Gaussian measures and -exponential measures are special. It will play a key role in the analysis of this paper.

Proposition 2.2.

The following statements hold [32, 33]

  1. For any , the space of -Gaussian measures is convex and isometric to the space of Gaussian measures with respect to the Wasserstein metric.

  2. For any with , the space is convex and isometric to the space of Gaussian measures with respect to the Wasserstein metric.

  3. Let and be two -exponential distributions. Then  , where is defined in (9), is the optimal plan in the definition of .

  4. We have

    (16)

2.5. The Tsallis entropy of a -Gaussian measure

The Tsallis entropy is defined by

(17)

The Tsallis entropy of a -Gaussian can also be computed explicitly using the property (13) and similar computations as in the Gaussian case.

Lemma 2.3.

It holds that [13]

(18)

3. Penalization of barycenters for Gaussian measures

In this section we study the following penalization of barycenters in the space of Gaussian measures

(19)

where the (negative) Boltzmann entropy functional of a probability measure defined in (10) and is a regularization parameter.

We assume that and seek for a Gaussian minimizer . We note that we consider here Gaussian measures with zero mean just for simplicity. The main results of the paper can be easily extended to the case of non-zero mean. From now on, we equip with the Frobenius inner product . The Frobenius norm is defined by . For we write if is positive semidefinite, and if is positive definite. Note that if and only if for all We denote by the Löwner order interval

Theorem 3.1.

Assume that . The penalization of barycenters problem (1) has a unique solution where the covariance matrix solves the following nonlinear matrix equation

(20)

In particular, in the scalar case (), we obtain

(21)

Before proving this theorem, we show the existence of solutions to equation (20).

Lemma 3.2.

Equation (20) has a positive definite solution.

Proof.

Pick so that for all Set

Then for

and hence

By definition of and ,

for every This shows that the map is a continuous self map on the Löwner order interval By Brouwer’s fixed point theorem, it has a fixed point. ∎

We are now ready to prove Theorem 3.1

Proof of Theorem 3.1.

According to (8) and (11) we have

Thus we can write (1) as a minimization problem in the space of positive definite matrices

(22)

where

(23)

where

It has been proved [6] that

  1. is strictly convex,

  2. ,

where

denotes the geometric mean between

and defined by

(24)

which is symmetric in and . According to [22, Proof of Theorem 8, Chapter 10]

is strictly convex. Using Jacobi’s formula for the derivative of the determinant and the chain rule, we get

(25)

It follows that is strictly convex. Furthermore, we have

(26)

From this we deduce that

(27)

where the gradient is with respect to the Frobenius inner product. Hence if and only if

Using the definition (24) of the geometric mean, the above equation can be written as

which is equation (20). By Lemma 3.2 this equation has a positive definite solution. This together with the strict convexity of imply that has a unique minimizer which is a Gaussian measure where solves (20). In the one dimensional case this equation reads

which results in

This completes the proof of the theorem. ∎

4. Penalization of barycenters for -Gaussian measures

In this section we study the following penalization of barycenters in the space of -Gaussian measures

(28)

where the Tsallis entropy functional defined by

(29)

We assume that and seek for a Gaussian minimizer .

Theorem 4.1.

Assume that . Suppose that for all . The penalization of barycenters problem (28) has a unique solution for all if either or and for sufficiently small if . The covariance matrix solves the following nonlinear matrix equation

(30)

where is defined by

(31)

The following proposition shows that equation (30) possesses a positive definite solution.

Proposition 4.2.

Equation (30) has a positive definite solution.

Proof.

Similarly as the proof of Lemma 3.2 we will also apply Brouwer’s fixed point theorem. We will show that

has a fixed point which is a positive definite matrix. Due to the appearance of the second term on the left-hand side of (30) the proof of this proposition is significantly involved than that of Lemma 3.2. Suppose that for all . Then similarly as in the proof of Lemma 3.2, for (with chosen later), we have

so that

Multiplying this inequality with then adding them together, noting that , we obtain

from which it follows that

(32)

To continue we consider two cases.

Case 1: . It follows from (32) that

(33)

Since , we have .

Case 1.1: . Consider the following equation

We have and . Since is continuous, it follows that there exists such that , that is

Similarly by considering the function , we deduce that there exists such that

Case 1.2: . Using the same argument as in the previous case for

we can show that there exist such that

Therefore in both Cases 1.1 and 1.2, there exist such that

Substituting these quantities into (33) we obtain