1 Introduction
Many multivariate statistical methods are based on empirical covariance operators. That is the case for multiple regression, principal components analysis, factor analysis, linear discriminant analysis, linear canonical analysis, multiple-set linear canonical analysis, and so on. However, these empirical covariance operators are known to be extremely sensitive to outliers. That is an undesirable property that makes the preceding methods themselves sensitive to outliers. For overcoming this problem, robust alternatives for these methods have been proposed in the literature, mainly by replacing the aforementioned empirical covariance operators by robust estimators. In this vein, robust versions of multivariate statistical methods have been introduced, especially for multiple regression (
[21]), principal components analysis ([8],[10],[16],[22]), factor analysis ([19]), linear discriminant analysis ([6],[9],[14]), linear canonical analysis ([5],[24]). Multiple-set linear canonical analysis (MSLCA) is an important multivariate statistical method that analyzes the relationship between more than two random vectors, so generalizing linear canonical analysis. It has been introduced for many years (e.g., [12]) and has been studied since then under different aspects (e.g., [15],[23],[25]). A formulation of MSLCA within the context of Euclidean random variables has been made recently (
[18]) and permitted to obtain an asymptotic theory for this analysis when it is estimated by using empirical covariance operators. To the best of our knowledge, such estimation of MSLCA is the one that have been tackled in the literature, despite the fact that it is known to be nonrobust as it is sensitive to outliers. So, there is a real interest in introducing a robust estimation of MSLCA as it was done for the others multivariate statistical methods. This can be done by using robust estimators of the covariance operators of the involved random vectors instead of the empirical covariance operators. Among such robust estimators, the minimum covariance determinant (MCD) estimator has been extensively studied ([1], [2],[3],[7]), and it is known to have good robustness properties. Also, its asymptotic properties have been obtained ([1],[2],[3]) mainly under elliptical distribution.In this paper, we propose a robust version of MSLCA based on MCD estimator of the covariance operator. We start by recalling, in Section 2, the notion of MSLCA for Euclidean random variables and we study its robustness properties by deriving the influence functions of the functionals that lead to its estimator from the empirical covariance operators. It is proved that the influence function of the operator that determines MSLCA is not bounded. In Section 3, we introduce a robust estimation of MSLCA (denoted by RMSLCA) by using the MCD estimator of the covariance operator on which this analysis is defined. Then we derive the influence function of the operator that determines RMSLCA, which is proved to be bounded, and that of the canonical coefficients and the canonical directions. Section 4 is devoted to asymptotic properties of RMSLCA. We obtain limiting distributions that are then used in Section 5 where a robust test for mutual non-correlation is introduced. The robustness properties of this test are studied through the derivation of the second order influence function of the test statistic under the null hypothesis. The proofs of all theorems and propositions are postponed in Section 6.
2 Influence in multiple-set canonical analysis
In this section we recall the notion of multiple-set linear canonical analysis (MSLCA) of Euclidean random variables as introduced by Nkiet[18], and also its estimation based on empirical covariance operators. Then, the robustness properties of this analysis are studied through derivation of the influence functions that correspond to the functionals related to it.
2.1 Multiple-set linear canonical analysis
Letting
be a probability space, and
be an integer such that , we consider random variables defined on this probability space and with values in Euclidean vector spaces respectively. We then consider the space which is also an Euclidean vector space equipped with the inner product defined by:where is the inner product of and , . From now on, we assume that the following assumption holds :
(): for , we have and , where denotes the norm induced by .
Then, we consider the random vector with values in , and we can give the following definition of multiple-set linear canonical analysis (see [18]):
Definition 2.1
The multiple-set linear canonical analysis (MSLCA) of is the search of a sequence of vectors of , where , satisfying:
(1) |
where and for :
A solution of the above maximization problem is obtained from spectral analysis of an operator that will know be specified. For , let us consider the covariance operators
where
denotes the tensor product such that
is the linear map : , and denotes the adjoint of . Letting be the canonical projectionwhich adjoint operator is the map
we consider the operators defined as
(2) |
The covariance operator is a self-adjoint and positive operator; we assume throughout this paper that it is invertible. Then, it is easy to check that is also self-adjoint positive and invertible operator, and we consider
The spectral analysis of this last operator gives a solution of the maximization problem specified in Definition 2.1. Indeed, if is an orthonormal basis of such that
is an eigenvector of
associated with the-th largest eigenvalue
, then we obtain a solution of (1) by taking , and we have . Finally, the MSLCA of is the family obtained as indicated above. The ’s are termed the canonical coefficients and the ’s are termed the canonical directions.Note that can be expressed as a function of the covariance operator of . Indeed, denoting by the space of linear maps fom to itself, and considering the linear maps and from to itself defined as
(3) |
it is easy to check, by using properties of tensor produts (see [11]), that
(4) |
and, therefore, from (2), (3) and (4), it follows
2.2 Estimation based on empirical covariance operator
Now, we recall the classical way for estimating MSLCA by using empirical covariance operators (see, e.g., [18]). For , let be an i.i.d. sample of . We then consider the sample means and empirical covariance operators defined for as
and . These permit to define random operators, with values in , as
(5) |
and to estimate by
(6) |
Considering the eigenvalues of , and an orthonormal basis of such that is an eigenvector of associated with , we can estimate by , by and by .
2.3 Influence functions
For studying the effect of a small amount of contamination at a given point on MSLCA it is important, as usual in robustness litterature (see [13]), to use influence function. More precisely, we have to derive expressions of the influence functions related to the functionals that give , and (for ) at the distribution of . Recall that the influence function of a functional at is defined as
where is the Dirac measure putting all its mass in .
First, we have to specify the functionals related to , and (for ) and their empirical counterparts. Let us consider the functional given by
where is the functional defined as
Applying this functional to the distribution of gives and, therefore, . For , denoting by (resp. ; resp. ) the functional such that is the -th largest eigenvalue of (resp. the associated eigenvector; resp. ), we have , and .
Furthermore, denoting by the empirical measure corresponding to the sample , we have
These functionals are to be taken into account in order to derive the influence functions related to MSLCA. We make the following assumption:
: For all , we have , where denotes the identity operator of .
Then, we have the following theorem that gives the influence function of .
Theorem 1
We suppose that the assumptions and hold. Then, for any vector we have:
(8) |
As determines MSLCA, it is important to ask whether its influence function is bounded. If so, we say that MSLCA is robust because it would mean that a contamination at the point has a limited effect on . The following proposition shows that is not bounded. We denote by the operators norm defined as .
Proposition 1
We suppose that the assumptions and hold. Then, there exists such that:
Now, we give in the following theorem, the influence functions related to the canonical coefficients and the canonical directions.
Theorem 2
We suppose that the assumptions and hold. Then, for any and any , we have:
(ii) We suppose, in addition, that . Then :
where denotes the identity operator of .
Remark 1
Romanazzi[20] derived influence functions for the squared canonical coefficients and the canonical directions obtained from linear canonical analysis (LCA) of two random vectors. LCA is in fact a particular case of MSLCA obtained when (see [18]). With Theorem 2 we recover the results of [20] when whe take . We will only show it below for the canonical coefficients. For , by applying Theorem 2 with , we obtain
(10) | |||||
The linear canonical analysis (LCA) of and is obtained from the spectral analysis of (since and ). If we denote by , , the related squared canonical coefficients and canonical vectors, it is known (see Remark 2.2 in [18]) that
(11) |
Then, putting and , we deduce from (10), (11) and the equality that:
(12) |
what is the result obtained in [20].
3 Robust multiple-set linear canonical analysis (RMSLCA)
It has been seen that the MSLCA based on empirical covariance operator is not robust since is not bounded. There is therefore an interest in proposing a robust version of MSLCA. In this section, we introduce such a version by replacing in (7) the empirical covariance operator by a robust estimator of . More precisely, we use the minimum covariance determinant (MCD) estimator of . We consider the following assumption:
() : the distribution of is an elliptical contoured distribution with density
where is a function having a strictly negative derivative .
We first define the estimator of MSLCA based on MCD estimator of , then we derive the related influence functions.
3.1 Estimation of MSLCA based on MCD estimator
Letting be a fixed real such that , we consider a subsample of size , where , and we define the empirical mean and covariance operator based on this subsample by:
and
We denote by the subsample of which minimizes the determinant of over all subsamples of size . Then, the MCD estimators of the mean and the covariance operator of are and , respectively. It is well known that the these estimators are robusts and have high breakdown points (see, e.g., [21]). From them, we can introduce an estimator of MSLCA which is expected to be also robust. Indeed, putting
we consider the random operators with values in defined as
where and , and we estimate by
(13) |
Considering the eigenvalues of , and an orthonormal basis of such that is an eigenvector of associated with , we estimate by , by and by . This gives a robust MSLCA that we denote by RMSLCA.
3.2 Influence functions
In order to derive the influence functions related to the above estimator of MSLCA, we have to specify the functional that corresponds to it. For doing that, we will first recall the functional associated to the above MCD estimator of covariance operator. Let
where is determined by the equation
being the usual gamma function. The functional related to the aforementioned MCD estimator of is defined in [2] (see also [1], [7]) by
where
It is known that where
Therefore, the functional related to is defined as
where and are defined in (3). Now, we can give the influence functions related to RMSLCA of . First, putting
and , we have:
Theorem 3
From this theorem we to obtain the following proposition which proves that RMSLCA is robust since the preceding influence function is bounded. We denote by the usual operators norm defined by .
Proposition 2
We suppose that the assumptions to hold. Then,
Now, we give in the following theorem, the influence functions related to the canonical coefficients and the canonical directions obtained from RMSLCA. For , denoting by (resp. ; resp. ) the functional such that is the -th largest eigenvalue of (resp. the associated eigenvector; resp. ), we put , and . Considering
(14) |
we have:
Theorem 4
We suppose that the assumptions to hold. Then, for any and any , we have:
(ii) We suppose, in addition, that . Then :
where is given in (2).
4 Asymptotics for RMSLCA
In this section we deal with asymptotic expansion for RMSLCA. We first establish asymptotic normality for and then we derive the asymptotic distribution of the canonical coefficients.
Theorem 5
Under the assumptions to , converges in distribution, as , to a random variable having a normal distribution in
where is the function defined by
(15) |
and , , and are given in (14).
This theorem permits to obtain asymptotic distributions for the canonical coefficients. Let (with ) be the decreasing sequence of distinct eigienvalues of , and the multiplicity of . Putting with , we clearly have for any . We denote by the orthogonal projector from
onto the eigenspace associated with
, and by the continuous map which associates to each self-adjoint operator the vector of its eigenvalues in nonincreasing order. For , we consider the -dimensional vectors
Comments
There are no comments yet.