 # Functional Inverse Regression in an Enlarged Dimension Reduction Space

We consider an enlarged dimension reduction space in functional inverse regression. Our operator and functional analysis based approach facilitates a compact and rigorous formulation of the functional inverse regression problem. It also enables us to expand the possible space where the dimension reduction functions belong. Our formulation provides a unified framework so that the classical notions, such as covariance standardization, Mahalanobis distance, SIR and linear discriminant analysis, can be naturally and smoothly carried out in our enlarged space. This enlarged dimension reduction space also links to the linear discriminant space of Gaussian measures on a separable Hilbert space.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Traditionally, sufficient dimension reduction problems refer to the estimation of the space spanned by the columns of

, where satisfies . Here, is a

-dimensional covariate vector, which we assume to satisfy

for simplicity, is a matrix and

is a univariate response variable. An equivalent form is

, where

is a mean zero random variable independent of

. By far the most well known procedure of estimation in this problem is sliced inverse regression (SIR, Li (1991)), where solving the leading eigenvectors of the generalized eigenvalue problem is all one needs to do to obtain the column space of . Here and . SIR is constructed under a linearity condition which requires and is then further developed into a whole class of inverse regression based methods for dimension reduction. To understand the inverse regression based methods from a different angle, we can normalize the covariates through viewing as new covariates and as new dimension reduction matrix. Considering the dimension reduction problem in terms of instead of enables much simplification and permits clearer exhibition of the critical operations Li (1991); Ma and Zhu (2012).

Dimension reduction problems have been extended from the traditional regression domain to the functional data analysis domain. See Jiang et al. (2014) and references therein. The model considered in the functional dimension reduction framework is

 Y=f(⟨β1,X⟩L2,…,⟨βd,X⟩L2,ϵ),

where is still a univariate response variable, is now a covariate function, are parameter functions in , and denotes the inner product of two functions in the space. Since the matrix vector product in the traditional case can be expressed as which can also be viewed as a vector of inner products between the vector and the covariate , one might think that the extension to the functional data framework is straightforward. However, there are many subtleties when finite dimensional quantities are extended to infinite dimensional ones, such as and to and . Some properties we take for granted in finite dimension may not hold automatically, e.g., some vector norm or the inner product between vectors may not be finite. If we want to perform the similar standardization as in the finite dimensional case by forming as new covariate function, and

as new dimension reduction functions for the functional correspondence of the variance-covariance matrix (operator)

, not only do we need to consider the extensions from vectors to functions and matrices to operators, but also to define a proper normed spaces and their corresponding requirements on these functions. A careful and rigorous consideration of these issues will enable less restrictive models and more flexible estimation. In fact, one of the main messages of this article is to point out that the requirement of the parameter functions ’s being in Jiang et al. (2014) is too strong and can be relaxed to include more interesting examples.

During the process of our investigation, we also realize that it is crucial to formulate the functional dimension reduction problem properly in order to facilitate the subsequent application of the existing mathematical tools from functional analysis involving Reproducing Kernel Hilbert Space (RKHS) and operator theories. To better prepare for such a task, we summarize some preliminary results in Section 2 and provide an outline of either a proof or an understanding for each result. In Section 3, we give a few motivating examples, wherein the dimension reduction functions fall out of the space required in Jiang et al. (2014) and hence cannot be solved under their model. We then present an extension of the functional dimension reduction model in Section 4, together with some main results. Our extension works on an enlarged space, so that the classical notion of SIR in standardized scale can be carried out.

## 2 Preliminary

### 2.1 Covariance operators and integral operators

Without loss of generality, we restrict our attention to functions defined on . Let the Hilbert space be the space of functions defined on and equipped with inner product given by

 ⟨u,v⟩L2=∫10u(t)v(t)dt,   u,v∈L2(I).

Let be a continuous bivariate function on . Then induces a linear integral operator, still written as , where its operation on a function is defined as

 (Γu)(s)=∫10Γ(s,t)u(t)dt=⟨Γ(s,⋅),u(⋅)⟩L2  for u∈L2(I). (1)

When for all , is said to be a symmetric linear integral operator. Note that

 ⟨u,Γv⟩L2=∫10∫10u(s)Γ(s,t)v(t)dtds, ⟨Γu,v⟩L2=∫10∫10u(s)Γ(t,s)v(t)dtds.

Hence, as long as is symmetric as a function of defined on , its induced operator is also a symmetric operator. When for all , is said to be positive semi-definite (or non-negative definite). When the equality holds if and only if a.s., is said to be positive definite (or strictly positive definite). A positive (semi-) definite linear integral operator is also known as a covariance operator. Let denote the unit ball in , i.e., . An operator defined on that maps to is said to be compact if the image of the unit ball, , is a compact set in .

Let ,

, be a random process with finite second moments and

be a univariate random variable. We now consider three specific bivariate functions and their induced operators,

 Γ(s,t)≡cov{X(s),X(t)},  Γw(s,t)≡E[cov{X(s),X(t)|Y}],

and

 Γe(s,t)≡cov[E{X(s)∣Y},E{X(t)∣Y}].

It is easy to verify that , and are all symmetric bivariate functions and . We further assume , , to be continuous. The continuity of functions , and on implies they are square integrable, and hence the continuity guarantees that, , and are compact operators on Lax (2002) (Chapter 22, Theorem 4). The definitions of , and also ensure that, they are positive semi-definite. Mercer’s Theorem Lax (2002) (Chapter 30, Theorem 11) then implies that they have discrete spectra. Taking

for instance, it can be expanded in a uniformly convergent series of eigenvalues and eigenfunctions

 Γ(s,t)=q∑i=1ξiϕi(s)ϕi(t),  q≤∞, (2)

which we sometimes write in short as

 Γ=q∑i=1ξiϕi⊗ϕTi.

Here are decreasing positive values. If is strictly positive definite, then and form a complete orthonormal basis for . The above result critically relies on the strictly positive definiteness of . Without the assumption of being strictly positive definite, we can still decompose as in (2), and the corresponding , , always form a complete orthonormal basis for , the range of . However, , when is not strictly positive definite. We further outline the following results which are relevant to the functional inverse regression study.

###### Proposition 1.

A continuous, symmetric, positive (semi-) definite integral operator is a trace-class operator, i.e.,

 q∑i=1ξi<∞.
###### Proof.

Because is a continuous function on , for , is a continuous function of in , thus is integrable. Hence, . ∎

###### Proposition 2.

For any positive (semi-) definite operator , there exists a mean zero random process satisfying such that and

 X(s)=q∑i=1Aiϕi(s),

where ’s are independent random variables with mean zero and variances ’s.

###### Proof.

For , let , where ’s are independent standard normal random variables. Obviously the resulting is a mean zero process that satisfies . In addition, . ∎

Note that, in our construction of the Gaussian process in the above proof, the sample path may not be in for a given realization . However,

ensures that the probability of this kinds of

is 0. That is, almost surely. In the following, we may simply use to denote that almost surely.

### 2.2 RKHS relevant for functional inverse regression

Let be the RKHS generated by . Specifically,

 HΓ≡closure{q∑i=1Γ(s,ti)αi:q∈N,αi∈R,ti∈[0,1]},

where the closure is taken with respect to the norm induced by the following inner product

 ⟨Γ(s,⋅),Γ(t,⋅)⟩HΓ=Γ(s,t).

Note that is a proper subset of . For , has the expansion

 f(t)=∑ifiϕi(t),  where fi=⟨f,ϕi⟩L2.

In addition to its -norm defined as , the -norm is given by

 ∥f∥2HΓ=∑if2iξi.

For , the -inner product is given by

 ⟨u,v⟩HΓ=∑iuiviξi, (3)

where , .

## 3 Motivating Examples

Throughout our development of a rigorous framework for functional inverse regression, we set up a space,

 R(Γ−1/2)≡{f:f=∞∑i=1fiϕi,fi∈R such that~{}∑iξif2i<∞}⊋L2(I),

which is the range space of the operator and is larger than . Below we give a few examples, wherein the dimension reduction functions fall out of and reside in . These examples motivate us to consider an enlarged space for functional dimension reduction. Interestingly, this enlarged space is the space considered by Grenander Grenander (1950) and Rao and Varadarajan Rao and Varadarajan (1963) in the study of linear discriminant analysis of Gaussian measures on a separable Hilbert space.

###### Example 1 (Binary response).

Let be a binary random variable having probabilities , and let be a complete orthonormal basis for . Given , consider

 Xy(t)=αy∞∑i=11i2+δψi(t)+∞∑i=11iZiψi(t),  t∈I,

where and is some scalar that controls the separation of two groups. Here ’s are independent standard normal random variables that are independent of . Let be the between-group covariance and be the within-group covariance. Then . We can easily calculate the within-group covariance function as

 Γw(s,t)≡E[cov{X(s),X(t)|Y}]=∞∑i=11i2ψi(s)ψi(t),

and the between-group covariance function as

 Γe(s,t) = cov[{αY∞∑i=11i2+δψi(s)},{αY∞∑i=11i2+δψi(t)}] = α2[∞∑i=11i2+δψi(s)][∞∑i=11i2+δψi(t)].
###### Proposition 3.

The following two optimization problems

 argmaxβ⟨Γeβ,β⟩L2⟨Γβ,β⟩L2≡argmaxβ⟨Γeβ,β⟩L2⟨Γwβ,β⟩L2.

have the same solution given by

 β(t)=c∞∑i=11iδψi(t), (4)

for any constant .

###### Proof.

From , we have

 ⟨Γβ,β⟩L2⟨Γeβ,β⟩L2=⟨Γeβ,β⟩L2+⟨Γwβ,β⟩L2⟨Γeβ,β⟩L2=1+⟨Γwβ,β⟩L2⟨Γeβ,β⟩L2.

Therefore,

 argmaxβ⟨Γeβ,β⟩L2⟨Γβ,β⟩L2≡argmaxβ⟨Γeβ,β⟩L2⟨Γwβ,β⟩L2.

Let . Then

 Γeβ = α2(∞∑i=1bii2+δ)(∞∑i=11i2+δψi), ⟨Γeβ,β⟩L2 = α2(∞∑i=1bii2+δ)2, Γwβ = ∞∑i=1bii2ψi, ⟨Γwβ,β⟩L2 = ∞∑i=1b2ii2.

Therefore, the optimization problem becomes to maximize

 α2(∑∞i=1bi/i2+δ)2∑∞i=1b2i/i2.

From Cauchy-Schwarz inequality,

 (∞∑i=1bii2+δ)2≤(∞∑i=1b2ii2)(∞∑i=11i2+2δ).

The equality holds when , which means

 β(t)∝∞∑i=11iδψi(t)

is the maximum eigenfunction. ∎

The dimension reduction function is obtained from solving the eigenvalue problem . The corresponding optimal linear classification rule is via

 sign(⟨β,X⟩L2). (5)

This result can be linked to some prior study of linear discriminant analysis of two Gaussian measures on a separable Hilbert space by Grenander Grenander (1950) and Rao and Varadarajan Rao and Varadarajan (1963). Let

 my(t)≡E{X(t)|Y=y}=αy∞∑i=11i2+δψi(t)=Γ1/2w(αy∞∑i=11i1+δψi)(t)∈R(Γ1/2w).

Note that given in (4) is not in , but in , since

 ∥Γ1/2wβ∥2L2=∥∥∑i1iδ+1ψi(t)∥∥2L2=∑i1i2+2δ<∞.

We also have , i.e., is in . Furthermore, from Proposition 3 and its proof, we have , where , , where . Therefore

 ⟨Γ1/2β,Γ1/2β⟩L2=⟨Γβ,β⟩L2 = ⟨(Γw+Γe)β,β⟩L2=⟨Γwβ,β⟩L2+⟨Γeβ,β⟩L2 = ∥Γ1/2wβ∥2L2+c1c2⟨my,Γ−1wmy⟩L2 = ∥Γ1/2wβ∥2L2+c2(∞∑i=11i2+2δ)∥Γ−1/2wmy∥2L2<∞.

Hence, .

This is an example that , , but , and the classification rule is well-defined. This indicates that, to solve for a linear discriminant analysis problem in , we cannot restrict to . We are obliged to enlarge the domain of to . On the other hand, requiring is indeed sufficient for the purpose of linear discriminant analysis given in (5

) for classifying the observations into two groups.

###### Example 2 (Categorical response).

The feature revealed in Example 1 is not unique for binary response variable . When the response variable is categorical, similar phenomenon can be observed. For example, consider the case, where the response variable is categorical with possible values . We normalize the values so that has mean zero and variance 1. Let

 Xy(t)=αy∞∑i=11i2+δψi(t)+∞∑i=11iZiψi(t),  t∈I. (6)

We can easily verify that the within-group covariance function is

 Γw(s,t)≡E[cov{X(s),X(t)|Y}]=∞∑i=11i2ψi(s)ψi(t),

and the between-group covariance function is

 Γe(s,t) ≡ cov{αY∞∑i=11i2+δψi(s),αY∞∑i=11i2+δψi(t)} = α2{∞∑i=11i2+δψi(s)}{∞∑i=11i2+δψi(t)},

Let . Note that the forms of and here are exactly the same as those in Example 1. Thus, when we perform the functional sliced inverse regression by solving for the first eigenfunction,

 β1=argmaxv⟨Γeβ,β⟩L2⟨Γwβ,β⟩L2,

we have exactly the same analysis as that in Example 1. It then leads to the same conclusion. That is, we are obliged to enlarge the domain of to . On the other hand, requiring is also sufficient for our purpose of classifying the observations into groups.

###### Example 3 (Continuous response).

Finally we provide an example with continuous response variable . Let have mean zero and variance 1, and let

 Xy(t)=αy∞∑i=11i2+δψi(t)+∞∑i=11iZiψi(t),  t∈I. (7)

We can easily verify that the within-group covariance function is

 Γw(s,t)≡Ecov{X(s),X(t)|Y}=∞∑i=11i2ψi(s)ψi(t).

The between-group covariance function is

 Γe(s,t) ≡ cov{αY∞∑i=11i2+δψi(s),αY∞∑i=11i2+δψi(t)} = α2{∞∑i=11i2+δψi(s)}{∞∑i=11i2+δψi(t)}.

Let . Now the same analysis as that in Examples 1 and 2 leads to the conclusion that, regardless of how many slices one decides to use, is in .

## 4 Enlarged dimension reduction space and main results

In this section, we present our main results. First, we establish in Theorem 1 an interesting link between covariance operators on and on . Next, we extend the functional dimension reduction to a relaxed model with enlarged space given in (10). The reproducing kernel Hilbert space , induced from the covariance operator , defines a proper range space for the sliced mean (see Proposition 6 and Theorem 2(a) below). It also plays the parallel role as the span of in finite dimension (see Proposition 6). Note that, is equipped with an inner product . Interestingly, this inner product refers to the standardization (see equation (3) above and equation (11) below) similar to the Mahalanobis distance and the standardization by the covariance matrix in finite vector case. We also study the linear design condition under the relaxed model in Proposition 7.

### 4.1 Bounded operators on L2(i) and on HΓ

###### Theorem 1.

Assume and are continuous, and respectively strict positive definite and positive semi-definite. Then, is a well-defined bounded linear operator on if and only if is a well-defined bounded linear operator on .

###### Proof.

Let . Then,

 ∥Γ−1/2h∥2L2=∥Γ−1/2∑iciϕi(⋅)∥2L2=∑ic2i/ξi=∥h∥2HΓ. (8)

That is,

 Γ−1/2h∈L2(I)⇔h∈HΓ. (9)

For any , there exists . Then

 ∥Γ−1/2ΓeΓ−1/2g∥L2=∥Γ−1/2Γeh∥L2=∥Γeh∥HΓ.

Together with (8), we have

 ∥Γ−1/2ΓeΓ−1/2g∥L2∥g∥2L2=∥Γeh∥HΓ∥h∥2HΓ,

which yields the statement of the theorem. ∎

###### Remark 1.

From (8), is bounded when it is defined as a linear operator from to . Here boundedness is referred to its induced operator norm, . However, when it operates on , may not belong to . For example, , hence , as . When combined with the additional covariance operator , Theorem 1 ensures the resulting operator is a bounded linear operator on , i.e., is a well-defined bounded operator. Note that is a much larger space than . Thus, the new operator composed of the three operators can be well-defined in a larger domain than the original operator can.

### 4.2 Relaxed model and extended estimation

We are now in a position to revisit the functional dimension reduction problem studied in Jiang et al. (2014), describe the problem more rigorously and extend it. Let , , be a stochastic process satisfying . Denote its covariance function and spectrum by

 Γ(s,t)≡cov{X(s),X(t)}=∞∑i=1ξiϕi(s)ϕi(t).

Then, can be expressed by an expansion as

 X(s)=∞∑i=1Aiϕi(s),

where ’s are independent random variables with mean zero and variances ’s. Below we give a Proposition, which ensures that we can exchange the order of double integrals.

###### Proposition 4.
 E⟨X,ϕi⟩L2=⟨E(X),ϕi⟩L2.
###### Proof.

From Cauchy-Schwarz inequality, we have

 E∫|X(s)ϕi(s)|ds≤E[(∫X2(s)ds)1/2(∫ϕ2i(s)ds)1/2]=E(∫X2(s)ds)1/2.

From Jensen’s inequality,

 E(∫X2(s)ds)1/2≤(E∫X2(s)ds)1/2<∞.

Thus, with , we can apply Fubini’s Theorem and get

 E∫X(s)ϕi(s)ds=∫E[X(s)]ϕi(s)ds.

Our proposed model is

 Y=f(⟨β1,X⟩L2,…,⟨βd,X⟩L2,ϵ),  where  β(⋅)∈R(Γ−1/2). (10)

Note that a critical difference of our formulation here from that in Jiang et al. (2014) is that, we only require to be in , which is larger than . This extension allows more flexibility in the dimension reduction functions.

###### Proposition 5.

For , is well-defined almost surely.

###### Proof.

Let and . We have

 E(⟨β,X⟩L2)2 = E(∑i⟨Γ−1/2δ,ϕi⟩L2⋅⟨X,ϕi⟩L2)2 = ∑i(ξ−1/2iδi)2EA2i = ∑iξ−1iδ2iξi=∑iδ2i=∥δ∥2L2<∞,

which implies that a.s. ∎

###### Remark 2.

Proposition 5 reveals an interesting result regarding the space where belongs to. The finite second moment condition is commonly used in statistical analysis. In the finite dimensional case, a random vector with finite second moment can have arbitrary variation for each component of the random vector, hence the random vector can take values in the entire space. However, this is not the case in the infinite dimensional functional space. To ensure finite integrated variance, a random function cannot have arbitrary variation along each dimension. In fact, the variations along all dimensions, except a finite set of dimensions, have to degenerate sufficiently fast to guarantee finite total variant. In fact, the set of dimensions in which almost all variation accumulate is fixed for a single random function. As a consequence, the random function cannot take values everywhere in . This is why the resulting space of the random function is in fact a much smaller subspace of . A feature of this subspace is that it ensures finite inner-product with elements in , where is the covariance function of . We define this space as

 R(Γ1/2)+≡{f:⟨f,β⟩L2<∞,a.s.∀β∈R(Γ−1/2)}.

Obviously . We will encounter this space again when we present an equivalent linearity condition later in Section 4.3. Note that although a single random function belongs to a much smaller space , the (uncountable) union of all such spaces of all random functions is the entire .

###### Remark 3.

For any , Proposition 5 ensures that the quantity is well-defined a.s. It is easy to verify the identity

 ⟨Γ−1/2f,Γ−1/2X⟩L2=⟨Γ−1f,X⟩L2=⟨f,X⟩HΓ. (11)

In the classical SIR, the main problem can be viewed as solving the eigenvalue problem of in the space scaled by . Now in Functional Sliced Inverse Regression (FSIR), (11) indicates that can be again viewed as the scaled operator from to .

In fact, the relaxed model leads to more flexible requirements on subsequent operators needed in the estimation procedure, which in turn leads to less stringent conditions on quantities such as mean covariates conditional on the response, etc. For example, in the FSIR approach, we would search for from the functional eigenvalue problem

 Γeβ=λΓβ, (12)

where as before, and

 Γe(s,t)≡cov[E{X(s)∣Y},E{X(t)∣Y}]=cov{mY(s),mY(t)}.

Letting , rewriting (12) as

 Γ−1/2ΓeΓ