Generalized bounds for active subspaces

10/03/2019 ∙ by Mario Teixeira Parente, et al. ∙ Technische Universität München 0

The active subspace method, as a dimension reduction technique, can substantially reduce computational costs and is thus attractive for high-dimensional computer simulations. The theory provides upper bounds for the mean square error of a given function of interest and a low-dimensional approximation of it. Derivations are based on probabilistic Poincaré inequalities which strongly depend on an underlying probability distribution that weights sensitivities of the investigated function. It is not this original distribution that is crucial for final error bounds, but a conditional distribution, conditioned on a so-called active variable, that naturally arises in the context. Existing literature does not take this aspect into account, is thus missing important details when it comes to distributions with, for example, exponential tails, and, as a consequence, does not cover such distributions theoretically. Here, we consider scenarios in which traditional estimates are not valid anymore due to an arbitrary large Poincaré constant. Additionally, we propose a framework that allows to get weaker, or generalized, estimates and that enables the practitioner to control the trade-off between the size of the Poincaré type constant and a weaker order of the final error bound. In particular, we investigate independently exponentially distributed random variables in 2 and n dimensions and give explicit expressions for involved constants, also showing the dependence on the dimension of the problem. Finally, we formulate an open problem to the community that aims for extending the class of distributions applicable to the active subspace method as we regard this as an opportunity to enlarge its usability.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many modern computational problems, having a large number of input variables or parameters, suffer from the ”curse of dimensionality”, a phenomenon characterized by a growth of computational complexity in the number of unknowns. For practitioners in this situation, the computations or simulations get too expensive or even intractable. The

active subspace method (ASM), or shorter, active subspaces [9, 10]

, is a set of tools for dimension reduction which reduce the effects caused by the curse of dimensionality. ASM splits an Euclidean input space into a so-called active and inactive subspace based on average sensitivities of a real-valued function of interest. The sensitivities are found by an eigendecomposition of a matrix involving outer products of the function’s gradient with itself. That is, eigenvalues indicate average sensitivities of a function of interest in the direction of the corresponding eigenvector. Eigenvectors and eigenvalues belonging to the active subspace are then considered as dominant for the global behavior of the function of interest, whereas the inactive subspace is regarded as negligible. That means, ASM reduces the dimension of a problem while keeping as much information as necessary. The practical usability of ASM has already been demonstrated for several real case studies in various applied disciplines, see, e. g., 

[14, 20, 23, 24, 25]. It has also motivated other methodological advances, e. g., in the solution of Bayesian inverse problems [22]

by an accelerated Markov chain Monte Carlo algorithm

[12], in uncertainty quantification and propagation [8, 26], and in the theory of ridge approximation, see, e. g., [11, 15, 16].

However, ASM is only one dimension reduction technique among others. For example, likelihood-informed dimension reduction for the solution of Bayesian inverse problems [13]

is based on a similar idea. This approach, however, analyzes the Hessian matrix of the function of interest instead of the gradient. An extension to vector-valued functions in gradient-based dimension reduction is given by

[30]. Dimension reduction for nonlinear Bayesian inverse problems based on the Kullback-Leibler (KL) divergence of approximate posteriors and (subspace) logarithmic Sobolev inequalities, including a comprehensive comparison of several other techniques, was provided by the authors of [31].

A main result in ASM theory is an upper bound on the mean square error between the original function of interest and its low-dimensional approximation on the active subspace. The corresponding proof is based on an inequality of Poincaré type which is probabilistic in nature since ASM involves a probability distribution that weights sensitivities of the function of interest at different locations in the input space. The upper bound consists of the product of a Poincaré type constant and the sum of eigenvalues corresponding to the inactive subspace, called inactive trace in the following. The constant derived in [10] is claimed to depend only on the original distribution which is generally incorrect. Also, to the knowledge of the authors, existing theory for dimension reduction techniques based on Poincaré or logarithmic Sobolev inequalities are subject to quite restrictive assumptions on the involved probability distribution. These assumptions comprise either the distribution having compact support or its density being of uniformly log-concave form, i. e., , where is such that its Hessian matrix for each  and some . By the famous Bakry-Émery criterion, the latter assumption implies a logarithmic Sobolev inequality and Poincaré inequality with universal Poincaré constant , see, e. g., [2, 27]. Note that the case , i. e.,  being only convex, is not covered. However, Bobkov [5] showed that a Poincaré inequality is still satisfied in this case and gave lower and upper bounds on the corresponding Poincaré constant. Distributions with heavier tails, i. e., for , as, e. g., exponential or Laplace distributions, do not satisfy the assumptions above, but are, however, of practical relevance.

In ASM theory, it is not the original distribution that must satisfy a Poincaré inequality, but a conditional distribution on the inactive subspace, which depends on a variable defined on the active subspace, has to do so. Both assumptions on the original distribution from above are in fact passed on to the conditional distribution. However, the case is cumbersome. We shall give an example for this case regarding a distribution that itself satisfies a Poincaré inequality, but might not be applicable at all or only with care due to an arbitrary large constant in the final bound for the mentioned mean square error. Our arguments are based on the bounds for corresponding Poincaré constants given by Bobkov in [5]. We also describe a way to still get upper bounds in this situation, however with a weaker, reduced order in the inactive trace. This order reduction is controllable in the sense that the practitioner can decide for the actual trade-off between the order of the inactive trace and the size of the corresponding Poincaré type constant. The mentioned general problem and its solution is exemplified on independently exponentially distributed random variables in and dimensions. Also, it is shown that the final constant is very much depending on the dimension of the problem. However, since this example is rather artificial, we formulate an open problem to the community at the end that aims at extending the class of distributions for which the bounds and the involved constants are explicitly or at least intuitively available in order to expand the applicability of ASM to more scenarios of practical interest. In particular, the class of multivariate generalized hyperbolic distributions is a rich class that is, in our opinion, worthwhile to get investigated.

The outline of the manuscript is as follows. Section 2 gives an introduction to ASM and its formal context. In Section 3

, we recall results involving compactly supported and normal distributions. The main results consisting of a motivation and discussion of the mentioned problems, with the independently exponentially distributed random variables as an extreme example, are presented in Section 

4. We provide some comments and formulate an open problem to the community as an outlook in Section 5. Finally, a summary is given in Section 6.

2 Active subspaces

The active subspace method is a set of tools for gradient-based dimension reduction [9, 10]. Its aim is to find directions in the domain of a function along which the function changes dominantly, on average. For illustration, consider a function of the form with a so-called profile function and a matrix , , . Functions of this type are called ridge functions [21]. Note that is constant along the null space of . Indeed, for and such that , it holds that

(2.1)

That is, is intrinsically at most -dimensional. For arbitrary , the general task is to find a suitable dimension , a function , , and a matrix such that .

For this, the active subspace method, as a gradient-based dimension reduction technique, needs to assume that the function of interest

is continuously differentiable with partial derivatives that are square-integrable w.r.t. a probability density function 

. We define  to be the support of , i. e., the closure of the set . We assume that is a continuity set, that is, its boundary is assumed to be a Lebesgue null set. The central object of investigation is a covariance-type matrix constructed with outer products of the gradient of , , with itself,

(2.2)

Since is real symmetric, there exists an eigendecomposition

with an orthogonal matrix 

and a diagonal matrix  with descending eigenvalues on its diagonal. The positive semidefiniteness of additionally ensures that .

The behavior of the function and the eigendecomposition of have an interesting, exploitable relation, i. e.,

(2.3)

If, for example, for some , then we can conclude that does not change in the direction of the corresponding eigenvector . That is, if eigenvalues , , are sufficiently small for a suitable , or even zero as in the case of ridge functions, then can be approximated by a lower-dimensional function. Formally, this corresponds to a split of and , i. e.,

(2.4)

where , and , .

Since

(2.5)

the split of suggests a new coordinate system for the active variable and the inactive variable . The range of , , is called the active subspace of . Note that the new variable is aligned to directions on which changes much more, on average, than on directions the variable is aligned to.

For the remainder, we define

(2.6)

Also, for and , let

(2.7)

to concisely denote changes of the coordinate system.

Variables , , and can also be regarded as random variables , , and , respectively, that are defined on a common probability space . The orthogonal variable transformation induces new probability density functions for random variables  and 

. That is, the joint distribution for 

is

(2.8)

Corresponding marginal and conditional densities are defined as usual. Additionally, set

(2.9)

to denote the set of all values for the active variable with a strictly positive density value. We frequently use that for a -integrable function , it holds that

(2.10)

Given the eigenvectors in , we still need to define a lower-dimensional function  approximating . For , a natural way is to define  as the conditional expectation of  given , i. e., as an integral over the inactive subspace weighted with the conditional density . Recall that this approximation is the best in an sense [18, Corollary 8.17]. Hence, we set

(2.11)

for . Additionally, we define

(2.12)

for , where denotes the interior of . Note that for , where denotes the interior of .

One of the main results in ASM theory is a theorem that gives an upper bound on the mean square error of approximating . The upper bound is the product of a Poincaré constant  and the sum of eigenvalues corresponding to the inactive subspace, called inactive trace. That is, if the inactive trace is small, then the mean square error of approximating is also small. Mathematically, for a given probability density function , the theorem states that

(2.13)

for a Poincaré constant .

The computation starts with

(2.14)
(2.15)

where we used a probabilistic Poincaré inequality w.r.t. for a given . Note that the Poincaré constant  of depends on . In [10, Theorem 3.1], it was indirectly assumed that this constant does not depend on . So, if , i. e., the distribution of has compact support, then we can continue with

(2.16)

The rest of the calculation is as in [10, Lemma 2.2 and Theorem 3.1]. We repeat the steps here for the sake of completeness. So, first, note that . Then, we write

(2.17)

The next section gives two examples for types of densities that are well-known to imply a probabilistic Poincaré inequality for and allow a uniform bound on its constant . It is emphasized again that it is not about satisfying a probabilistic Poincaré inequality, but has to do so.

3 Compactly supported and normal distributions

The uniform distribution, as a canonical example of a distribution with compact support 

, is well-known to satisfy a probabilistic Poincaré inequality on its own and to imply the same for densities  which are also uniform. Note that a probabilistic Poincaré inequality involving a uniform distribution is actually equivalent to a regular Poincaré inequality w.r.t. the Lebesgue measure. The following theorem is a slightly more general result. We add a convexity assumption on since it makes Poincaré constants explicit. Recall that the Poincaré constant for a convex domain with diameter  is , see, e. g.,  [4].

Theorem 3.1.

Assume that is compact and convex. If for all , then

(3.1)

for a constant

(3.2)
Proof.

Define

(3.3)

and note that it is convex for . It holds that . Note that

(3.4)

for and . This justifies the following lines of computation for ,

(3.5)
(3.6)
(3.7)
(3.8)
(3.9)

Then, combining Eq. (2.17) with Eq. (3.9) yields the result in Eq. (3.1). ∎

Also, it is well-known that the Poincaré constant is one for the multivariate standard normal distribution  [7]. Since its density is rotationally symmetric, random variables and are independent and each follow again a standard normal distribution. Hence, it holds that in Eq. (2.13). For general multivariate normal distributions with mean  and non-degenerate covariance matrix , shifting and scaling arguments give that in Eq. (2.13).

4 Main results

This section contains the main contribution of the manuscript which lies in an investigation of general log-concave probability measures w.r.t. their applicability for ASM. Log-concave distributions have Lebesgue densities of the form for a convex function . Note that is included in the codomain of . The conditional density for a given is then given by

(4.1)

where . Note that inherits convexity (in ) from . Bobkov [5] shows that general log-concave densities satisfy a Poincaré inequality and gives lower and upper bounds on the corresponding Poincaré constant.

First, we discuss the special case of -uniformly convex functions which are known to satisfy a Poincaré inequality with universal Poincaré constant  implying their applicability in the context of active subspaces due to . However, the assumption of the density  being of uniformly log-concave type is somewhat restrictive since it excludes distributions with heavier tails as, for example, exponential or Laplace distributions. For this reason, we investigate more general log-concave densities and show that there might arise problems with this class of probability distributions. In addition, the problems and their proposed solution are exemplified on an extreme case example involving independently exponentially distributed random variables in  and  dimensions.

4.1 -uniformly convex functions

Definition 4.1 (-uniformly convex function).

A function is said to be -uniformly convex, if there is an such that for all it holds that

(4.2)

for all , where denotes the Hessian matrix of .

In [27, p. 43–44], it was shown that there is a dimension-free Poincaré constant for -uniformly log-concave . Note that this says nothing about the special case . The existence of a dimension-free Poincaré constant for this special case is a consequence of the famous Kannan-Lovász-Simonovits conjecture, see, e. g.,  [1, 19]. However, since we need a Poincaré inequality for , , we have to show that -uniformly log-concavity of implies -uniformly log-concavity of . So, let . Recall that for a convex function . The Hessian matrix  (w.r.t. ) computes to

(4.3)

Choose arbitrarily. For every , it holds that

(4.4)
(4.5)

That is, is -uniformly log-concave for each . Since inherits the dimension-free Poincaré constant  from , Theorem 3.1 also holds for -uniformly log-concave densities.

For example, -uniformly log-concave densities comprise multivariate normal distributions  with mean  and covariance matrix  (). However, distributions that satisfy the assumption only for

as, e. g., Weibull distributions with the exponential distribution as a special case or Gamma distributions with shape parameter 

, only belong to the class of general log-concave distributions.

4.2 General convex functions

Since we cannot make use of a universal dimension-free Poincaré constant involving general convex functions , we look at them more closely in this subsection. Recall that , , for a convex function . We have to deal with the fact that the essential supremum of the random Poincaré constant  from does possibly not exist. A corresponding example is given in Subsection 4.3.1. In the step from (2.15) to (2.16), we have applied Hölder’s inequality with Hölder conjugates . This is not possible for unbounded random variables  and thus we have to use a different, weaker pair of conjugates , . If we assume that is bounded implying for some constant , we get

(4.6)
(4.7)
(4.8)
(4.9)
(4.10)

where . The -, -, and -dependence of  is notationally neglected in the following. Now, if possible, choose a suitable to get . Note that we loose first order in the eigenvalues from the inactive subspace, but have instead order . Of course, the constant could get arbitrarily large as , but this depends strongly on

and its moments, see the example given in Subsection 

4.3.1.

It is known by Bobkov [5, Eqs. (1.3), (1.8) and p. 1906] that there exists a dimensionally-dependent Poincaré constant  for a general log-concave density that is bounded from below and above by

(4.11)
(4.12)

where and [5, Eqs. (1.8) and (3.4)] is a universal constant. To the authors’ knowledge, the constant  is the best available. We provide a scenario in Subsection 4.3.1 (”Rotation by ”) in which the lower bound viewed as a random variable has no finite essential supremum implying the same for .

However, to make use of the result in Eq. (4.10), we need to investigate the involved constant . Using Jensen’s inequality for weighted sums, it follows that

(4.13)
(4.14)

Eventually, we get

(4.15)

with

(4.16)

4.3 Independently exponentially distributed random variables as an extreme case

In this subsection, we investigate the quantity  from Subsection 4.2 for the exponential distribution in two and  dimensions. We regard a random vector  whose components are independently exponentially distributed with unit rates , . We will see that investigations with unit rates are sufficient to derive statements also involving other rates. The distribution of  has the density

(4.17)

That is, in this case and

(4.18)

Note that is convex. The orthogonal variable transformation is driven by the calculated active and inactive subspace. We are interested in the quantity  from Eq. (4.16) and therefore need to study densities and gained from under an arbitrary orthogonal transformation. An orthogonal transformation is a composition of reflections and rotations. However, we can limit our investigations to rotations since does not depend on orientations.

4.3.1 dimensions

The joint density of two independently exponentially distributed random variables and both with unit rate is

(4.19)

Let us first regard a rotation of the two-dimensional Cartesian coordinate system by a general angle 

to a coordinate system for , and then subsequently look at the special case as an example for an unbounded Poincaré constant  of . Variables are written in thin letters in this subsection since they denote real values and not multidimensional vectors. Note that the bound from Eq. (4.15) in this two-dimensional setting becomes

(4.20)

with

(4.21)

Rotation by general

For a general angle , we rotate the original coordinate system formally with a rotation matrix , i.e.,

(4.22)

It follows for the joint density that, for s.t. ,

(4.23)
(4.24)

If we define and , we have

(4.25)

Fig. 1 illustrates the situation for a positive (Fig. 0(a)) and a negative (Fig. 0(b)) angle .

(a)
(b)
Figure 1: Rotations of the coordinate system with a positive (a) and a negative (b) angle. The orange lines depict contour lines in the support of . The red lines show the values of for a given . Their solid parts mark regions within the support of , whereas the dashed parts identify values with density zero.

The interval of investigation for can be reduced by reasons of periodicity and symmetry. First, note that the map , with -dependence hidden in variables and , is -periodic in since an additional rotation by  corresponds to changing signs of  and  which is not important for integrals in . Hence, it suffices to consider . Secondly, from Fig. 1 it can be deduced that , as a map of , is symmetric around  in and symmetric around  in . This fact is also shown in Fig. 2. That is, it is enough to investigate angles .

Figure 2: Illustration of symmetries in of the map  for several .

For the computation of integrals in , , it is necessary, for a given , to determine boundaries  and of intervals for that lie in the support of the joint density  (see the thick solid lines in Fig. 1). The integrals in  are computed using the computer algebra system Wolfram Mathematica [28]. The computation requires to treat the cases  and  differently (see Fig. 1).

For negative and arbitrary , we have that

(4.26)

and , i. e.,

(4.27)

We compute that

(4.28)

That is, it is constant in and explains the left part of the graph of in Fig. 2 which shows that  does not depend on  for .

For non-negative and a given , the boundaries are computed to and , i. e.,

(4.29)

We compute that

(4.30)

for , , , and . can actually be bounded for . Indeed, since , it holds that implying . It follows that

(4.31)
(4.32)
(4.33)

This bound is itself unbounded in since and as . Fig. 2(a) illustrates the boundedness of and additionally shows that it approaches the unbounded function  as . For completion, Fig. 2(b) stresses the peculiarity of this limit case which is thus discussed separately in the subsequent paragraph.

(a)
(b)
Figure 3: (a) The log-log plot of the map shows that it is bounded for angles , but approaching the unbounded function , which corresponds to , as .
(b) The plot shows the map  for several angles . Also, it illustrates the fact that is a special case for which can get arbitrarily large.

Rotation by

A rotation of , i. e., , is a special case since from Eq. (4.25) becomes zero. The joint density for and is then

(4.34)

A graphical illustration of this case is given in Fig. 4. Consequently, the marginal distribution of is

(4.35)

and the conditional density computes to