Traditionally, sufficient dimension reduction problems refer to the estimation of the space spanned by the columns of, where satisfies . Here, is a
-dimensional covariate vector, which we assume to satisfyfor simplicity, is a matrix and
is a univariate response variable. An equivalent form is, where
is a mean zero random variable independent of. By far the most well known procedure of estimation in this problem is sliced inverse regression (SIR, Li (1991)), where solving the leading eigenvectors of the generalized eigenvalue problem is all one needs to do to obtain the column space of . Here and . SIR is constructed under a linearity condition which requires and is then further developed into a whole class of inverse regression based methods for dimension reduction. To understand the inverse regression based methods from a different angle, we can normalize the covariates through viewing as new covariates and as new dimension reduction matrix. Considering the dimension reduction problem in terms of instead of enables much simplification and permits clearer exhibition of the critical operations Li (1991); Ma and Zhu (2012).
Dimension reduction problems have been extended from the traditional regression domain to the functional data analysis domain. See Jiang et al. (2014) and references therein. The model considered in the functional dimension reduction framework is
where is still a univariate response variable, is now a covariate function, are parameter functions in , and denotes the inner product of two functions in the space. Since the matrix vector product in the traditional case can be expressed as which can also be viewed as a vector of inner products between the vector and the covariate , one might think that the extension to the functional data framework is straightforward. However, there are many subtleties when finite dimensional quantities are extended to infinite dimensional ones, such as and to and . Some properties we take for granted in finite dimension may not hold automatically, e.g., some vector norm or the inner product between vectors may not be finite. If we want to perform the similar standardization as in the finite dimensional case by forming as new covariate function, and
as new dimension reduction functions for the functional correspondence of the variance-covariance matrix (operator), not only do we need to consider the extensions from vectors to functions and matrices to operators, but also to define a proper normed spaces and their corresponding requirements on these functions. A careful and rigorous consideration of these issues will enable less restrictive models and more flexible estimation. In fact, one of the main messages of this article is to point out that the requirement of the parameter functions ’s being in Jiang et al. (2014) is too strong and can be relaxed to include more interesting examples.
During the process of our investigation, we also realize that it is crucial to formulate the functional dimension reduction problem properly in order to facilitate the subsequent application of the existing mathematical tools from functional analysis involving Reproducing Kernel Hilbert Space (RKHS) and operator theories. To better prepare for such a task, we summarize some preliminary results in Section 2 and provide an outline of either a proof or an understanding for each result. In Section 3, we give a few motivating examples, wherein the dimension reduction functions fall out of the space required in Jiang et al. (2014) and hence cannot be solved under their model. We then present an extension of the functional dimension reduction model in Section 4, together with some main results. Our extension works on an enlarged space, so that the classical notion of SIR in standardized scale can be carried out.
2.1 Covariance operators and integral operators
Without loss of generality, we restrict our attention to functions defined on . Let the Hilbert space be the space of functions defined on and equipped with inner product given by
Let be a continuous bivariate function on . Then induces a linear integral operator, still written as , where its operation on a function is defined as
When for all , is said to be a symmetric linear integral operator. Note that
Hence, as long as is symmetric as a function of defined on , its induced operator is also a symmetric operator. When for all , is said to be positive semi-definite (or non-negative definite). When the equality holds if and only if a.s., is said to be positive definite (or strictly positive definite). A positive (semi-) definite linear integral operator is also known as a covariance operator. Let denote the unit ball in , i.e., . An operator defined on that maps to is said to be compact if the image of the unit ball, , is a compact set in .
, be a random process with finite second moments andbe a univariate random variable. We now consider three specific bivariate functions and their induced operators,
It is easy to verify that , and are all symmetric bivariate functions and . We further assume , , to be continuous. The continuity of functions , and on implies they are square integrable, and hence the continuity guarantees that, , and are compact operators on Lax (2002) (Chapter 22, Theorem 4). The definitions of , and also ensure that, they are positive semi-definite. Mercer’s Theorem Lax (2002) (Chapter 30, Theorem 11) then implies that they have discrete spectra. Taking
for instance, it can be expanded in a uniformly convergent series of eigenvalues and eigenfunctions
which we sometimes write in short as
Here are decreasing positive values. If is strictly positive definite, then and form a complete orthonormal basis for . The above result critically relies on the strictly positive definiteness of . Without the assumption of being strictly positive definite, we can still decompose as in (2), and the corresponding , , always form a complete orthonormal basis for , the range of . However, , when is not strictly positive definite. We further outline the following results which are relevant to the functional inverse regression study.
A continuous, symmetric, positive (semi-) definite integral operator is a trace-class operator, i.e.,
Because is a continuous function on , for , is a continuous function of in , thus is integrable. Hence, . ∎
For any positive (semi-) definite operator , there exists a mean zero random process satisfying such that and
where ’s are independent random variables with mean zero and variances ’s.
For , let , where ’s are independent standard normal random variables. Obviously the resulting is a mean zero process that satisfies . In addition, . ∎
Note that, in our construction of the Gaussian process in the above proof, the sample path may not be in for a given realization . However,
ensures that the probability of this kinds ofis 0. That is, almost surely. In the following, we may simply use to denote that almost surely.
2.2 RKHS relevant for functional inverse regression
Let be the RKHS generated by . Specifically,
where the closure is taken with respect to the norm induced by the following inner product
Note that is a proper subset of . For , has the expansion
In addition to its -norm defined as , the -norm is given by
For , the -inner product is given by
where , .
3 Motivating Examples
Throughout our development of a rigorous framework for functional inverse regression, we set up a space,
which is the range space of the operator and is larger than . Below we give a few examples, wherein the dimension reduction functions fall out of and reside in . These examples motivate us to consider an enlarged space for functional dimension reduction. Interestingly, this enlarged space is the space considered by Grenander Grenander (1950) and Rao and Varadarajan Rao and Varadarajan (1963) in the study of linear discriminant analysis of Gaussian measures on a separable Hilbert space.
Example 1 (Binary response).
Let be a binary random variable having probabilities , and let be a complete orthonormal basis for . Given , consider
where and is some scalar that controls the separation of two groups. Here ’s are independent standard normal random variables that are independent of . Let be the between-group covariance and be the within-group covariance. Then . We can easily calculate the within-group covariance function as
and the between-group covariance function as
The following two optimization problems
have the same solution given by
for any constant .
From , we have
Let . Then
Therefore, the optimization problem becomes to maximize
From Cauchy-Schwarz inequality,
The equality holds when , which means
is the maximum eigenfunction. ∎
The dimension reduction function is obtained from solving the eigenvalue problem . The corresponding optimal linear classification rule is via
This result can be linked to some prior study of linear discriminant analysis of two Gaussian measures on a separable Hilbert space by Grenander Grenander (1950) and Rao and Varadarajan Rao and Varadarajan (1963). Let
Note that given in (4) is not in , but in , since
We also have , i.e., is in . Furthermore, from Proposition 3 and its proof, we have , where , , where . Therefore
This is an example that , ,
but , and the classification rule is well-defined. This
indicates that, to solve for a linear discriminant analysis problem
in , we cannot restrict to . We are obliged
to enlarge the domain of to . On the other
hand, requiring is indeed sufficient for
the purpose of linear discriminant analysis given
in (5 ) for classifying the observations into two
) for classifying the observations into two groups.
Example 2 (Categorical response).
The feature revealed in Example 1 is not unique for binary response variable . When the response variable is categorical, similar phenomenon can be observed. For example, consider the case, where the response variable is categorical with possible values . We normalize the values so that has mean zero and variance 1. Let
We can easily verify that the within-group covariance function is
and the between-group covariance function is
Let . Note that the forms of and here are exactly the same as those in Example 1. Thus, when we perform the functional sliced inverse regression by solving for the first eigenfunction,
we have exactly the same analysis as that in Example 1. It then leads to the same conclusion. That is, we are obliged to enlarge the domain of to . On the other hand, requiring is also sufficient for our purpose of classifying the observations into groups.
Example 3 (Continuous response).
Finally we provide an example with continuous response variable . Let have mean zero and variance 1, and let
We can easily verify that the within-group covariance function is
The between-group covariance function is
Let . Now the same analysis as that in Examples 1 and 2 leads to the conclusion that, regardless of how many slices one decides to use, is in .
4 Enlarged dimension reduction space and main results
In this section, we present our main results. First, we establish in Theorem 1 an interesting link between covariance operators on and on . Next, we extend the functional dimension reduction to a relaxed model with enlarged space given in (10). The reproducing kernel Hilbert space , induced from the covariance operator , defines a proper range space for the sliced mean (see Proposition 6 and Theorem 2(a) below). It also plays the parallel role as the span of in finite dimension (see Proposition 6). Note that, is equipped with an inner product . Interestingly, this inner product refers to the standardization (see equation (3) above and equation (11) below) similar to the Mahalanobis distance and the standardization by the covariance matrix in finite vector case. We also study the linear design condition under the relaxed model in Proposition 7.
4.1 Bounded operators on and on
Assume and are continuous, and respectively strict positive definite and positive semi-definite. Then, is a well-defined bounded linear operator on if and only if is a well-defined bounded linear operator on .
Let . Then,
For any , there exists . Then
Together with (8), we have
which yields the statement of the theorem. ∎
From (8), is bounded when it is defined as a linear operator from to . Here boundedness is referred to its induced operator norm, . However, when it operates on , may not belong to . For example, , hence , as . When combined with the additional covariance operator , Theorem 1 ensures the resulting operator is a bounded linear operator on , i.e., is a well-defined bounded operator. Note that is a much larger space than . Thus, the new operator composed of the three operators can be well-defined in a larger domain than the original operator can.
4.2 Relaxed model and extended estimation
We are now in a position to revisit the functional dimension reduction problem studied in Jiang et al. (2014), describe the problem more rigorously and extend it. Let , , be a stochastic process satisfying . Denote its covariance function and spectrum by
Then, can be expressed by an expansion as
where ’s are independent random variables with mean zero and variances ’s. Below we give a Proposition, which ensures that we can exchange the order of double integrals.
From Cauchy-Schwarz inequality, we have
From Jensen’s inequality,
Thus, with , we can apply Fubini’s Theorem and get
Our proposed model is
Note that a critical difference of our formulation here from that in Jiang et al. (2014) is that, we only require to be in , which is larger than . This extension allows more flexibility in the dimension reduction functions.
For , is well-defined almost surely.
Let and . We have
which implies that a.s. ∎
Proposition 5 reveals an interesting result regarding the space where belongs to. The finite second moment condition is commonly used in statistical analysis. In the finite dimensional case, a random vector with finite second moment can have arbitrary variation for each component of the random vector, hence the random vector can take values in the entire space. However, this is not the case in the infinite dimensional functional space. To ensure finite integrated variance, a random function cannot have arbitrary variation along each dimension. In fact, the variations along all dimensions, except a finite set of dimensions, have to degenerate sufficiently fast to guarantee finite total variant. In fact, the set of dimensions in which almost all variation accumulate is fixed for a single random function. As a consequence, the random function cannot take values everywhere in . This is why the resulting space of the random function is in fact a much smaller subspace of . A feature of this subspace is that it ensures finite inner-product with elements in , where is the covariance function of . We define this space as
Obviously . We will encounter this space again when we present an equivalent linearity condition later in Section 4.3. Note that although a single random function belongs to a much smaller space , the (uncountable) union of all such spaces of all random functions is the entire .
For any , Proposition 5 ensures that the quantity is well-defined a.s. It is easy to verify the identity
In the classical SIR, the main problem can be viewed as solving the eigenvalue problem of in the space scaled by . Now in Functional Sliced Inverse Regression (FSIR), (11) indicates that can be again viewed as the scaled operator from to .
In fact, the relaxed model leads to more flexible requirements on subsequent operators needed in the estimation procedure, which in turn leads to less stringent conditions on quantities such as mean covariates conditional on the response, etc. For example, in the FSIR approach, we would search for from the functional eigenvalue problem
where as before, and
Letting , rewriting (12) as