1 Introduction
A common feature of highdimensional data is that the data dimension is high, however, the sample size is relatively low. This is the socalled “HDLSS” or “large , small ” data situation where ; here is the data dimension and is the sample size. Suppose we have independent and variate two populations,
, having an unknown mean vector
and unknown covariance matrix for each . We do not assume . The eigendecomposition of is given by , where is a diagonal matrix of eigenvalues, , andis an orthogonal matrix of the corresponding eigenvectors. We have independent and identically distributed (i.i.d.) observations,
, from each . We assume . We estimate and by and . Let be an observation vector of an individual belonging to one of the two populations. We assume and s are independent. When the s are Gaussian, a typical classification rule is that one classifies an individual into ifand into otherwise. However, the inverse matrix of does not exist in the HDLSS context (). Also, we emphasize that the Gaussian assumption is strict in real highdimensional data analyses. Bickel and Levina (2004)
considered a naive Bayes classifier for highdimensional data
. Fan and Fan (2008)considered classification after feature selection.
Cai and Liu (2011), Shao et al. (2011) and Li and Shao (2015) gave sparse linear or quadratic classification rules for highdimensional data. The above references all assumed the following eigenvalues condition: There is a constant (not depending on ) such that(1) 
Dudoit et al. (2002) considered using the inverse matrix defined by only diagonal elements of . Aoshima and Yata (2011, 2015a) considered substituting for by using the difference of a geometric representation of HDLSS data from each . Here,
denotes the identity matrix of dimension
. On the other hand, Hall et al. (2005, 2008) and Marron et al. (2007) considered distance weighted classifiers. Ahn and Marron (2010) considered a HDLSS classifier based on the maximal data piling. Hall et al. (2005), Chan and Hall (2009), Aoshima and Yata (2014) and Watanabe et al. (2015) considered distancebased classifiers. Aoshima and Yata (2014) gave the misclassification rate adjusted classifier for multiclass, highdimensional data whose misclassification rates are no more than specified thresholds under the following eigenvalues condition:(2) 
We emphasize that (2) is much milder than (1) because (2) includes the case that as . See Remark 1 for the details. Aoshima and Yata (2014) considered the distancebased classifier as follows: Let
(3) 
Then, one classifies into if and into otherwise. Here, is a biascorrection term. Note that the classifier (3) is equivalent to the scale adjusted distancebased classifier given by Chan and Hall (2009). Aoshima and Yata (2015b) called the classification rule (3) the “distancebased discriminant analysis (DBDA)”.
Recently, Aoshima and Yata (2018) considered the “strongly spiked eigenvalue (SSE) model” as follows:
(4) 
On the other hand, Aoshima and Yata (2018) called (2) the “nonstrongly spiked eigenvalue (NSSE) model”. Note that (4) holds under the condition:
(5) 
from the fact that . Here, is the first contribution ratio. We call (5) the “super strongly spiked eigenvalue (SSSE) model”.
Remark 1.
Let us consider a spiked model such as
(6) 
with positive and fixed constants, s, s and s, and a positive and fixed integer . Note that the NSSE condition (2) holds when for . On the other hand, the SSE condition (4) holds when , and further the SSSE condition (5) holds when . See Yata and Aoshima (2012) for the details of the spiked model.
We observed
for six wellknown microarray data sets by using the noisereduction methodology and the crossdatamatrix methodology. For those methods, see Yata and Aoshima (2010, 2012). Note that is the contribution ratio and is a quadratic contribution ratio of the th eigenvalue. We estimated by and by , where is defined by (15), and and are defined in Section 4.3. We note that and are consistent estimators of and when . See (17) and (22) for the details. The six microarray data sets are as follows:
 (Di)

Nonpathologic tissues data with genes, consisting of : placenta or blood ( samples) and other solid tissue ( samples) given by Christensen et al. (2009);
 (Dii)

Colon cancer data with genes, consisting of : colon tumor ( samples) and normal colon ( samples) given by Alon et al. (1999);
 (Diii)

Breast cancer data with genes, consisting of good ( samples) and poor ( samples) given by Gravier et al. (2010);
 (Div)

Lymphoma data with genes, consisting of DLBCL (58 samples) and follicular lymphoma (19 samples) given by Shipp et al. (2002);
 (Dv)

Myeloma data with genes, consisting of patients without bone lesions (36 samples) and patients with bone lesions (137 samples) given by Tian et al. (2003);
 (Dvi)

Breast cancer data with genes, consisting of luminal group (84 samples) and nonluminal group (44 samples) given by Naderi et al. (2007).
The data sets (Dii), (Div) and (Dv) are given in Jeffery et al. (2006), (Di) and (Diii) are given in Ramey (2016), and (Dvi) is given in Glaab et al. (2012). We summarized the results for , and in Table 1, where is an estimate of , given in Section 4.3. We will discuss and in Sections 3 and 4.3. We also visualized the first ten contribution ratios given by in Fig. 1 and the first ten quadratic contribution ratios given by in Fig. 2. See (17) and (22) for the details.
(Di)  (Dii)  (Diii)  (Div)  (Dv)  (Dvi)  
1413  2000  2905  7129  12625  47293  
(104,113)  (40,22)  (111,57)  (58,19)  (36,137)  (84,44)  
0.636  0.153  0.108  0.22  0.038  0.091  
0.233  0.157  0.083  0.386  0.035  0.085  
0.995  0.569  0.304  0.71  0.283  0.502  
0.582  0.523  0.363  0.963  0.269  0.403  
2  3  2  2  1  2  
4  2  2  2  2  3 
We observed from Fig. 1 that the first several eigenvalues are much larger than the rest for the microarray data sets (except (Dv)). In particular, the first eigenvalues for (Di) and (Div) are extremely large. These data appear to be consistent with the SSSE asymptotic domain given in (5). On the other hand, the first several eigenvalues for (Dv) are relatively small. However, from Table 1 and Fig. 2, s for (Dv) are not sufficiently small. Also, s for (Dii), (Diii) and (Dvi) are relatively large in Table 1 and Fig. 2. Hence, the six microarray data appear to be consistent with the SSE asymptotic domain given in (4). See Section 4.3. In this paper, we consider classifiers under the SSE model. We do not assume the normality of the population distributions. We propose an effective distancebased classifier for such highdimensional data sets.
The organization of this paper is as follows. In Section 2, we introduce asymptotic properties of the distancebased classifier for highdimensional data. We discuss the distancebased classifier in the SSE model. In Section 3, we consider a distancebased classifier using eigenstructures for the SSE model. In Section 4, we discuss estimation of the eigenvalues and eigenvectors for the SSE model. We create a new distancebased classifier by estimating the eigenstructures. In Section 5, we give simulation studies and discuss the performance of the new classifier. Finally, we demonstrate the new classifier by using microarray data sets.
2 Distancebased classifier for highdimensional data
In this section, we introduce asymptotic properties of the distancebased classifier for highdimensional data. As for any positivesemidefinite matrix , we write the square root of as . Let
where is considered as a sphered data vector having the zero mean vector and identity covariance matrix. Similar to Bai and Saranadasa (1996) and Chen and Qin (2010), we assume the following assumption for , , as necessary:
 (Ai)

for all , , and for all .
When the s are Gaussian, (Ai) naturally holds. Let
where denotes the Euclidean norm. Note that when for . Also, note that the divergence condition “, and ” is equivalent to “”. Let
and for . Note that when for .
Let denote the error rate of misclassifying an individual from into the other class for . Then, for the classification rule (3) DBDA, Aoshima and Yata (2014) gave the following result.
Theorem 1 (Aoshima and Yata, 2014).
Assume the following conditions:
 (AYi)

as for ;
 (AYii)

as .
Then, for DBDA, we have that as
(7) 
Remark 2.
For DBDA, under (AYi) and (AYii), one may write (7) as
Next, we consider the asymptotic normality of the classifier. Hereafter, for a function, , “ as ” implies and . Let “” denote the convergence in distribution,
denote a random variable distributed as the standard normal distribution and
denote the cumulative distribution function of the standard normal distribution.
Aoshima and Yata (2014) gave the following result.Theorem 2 (Aoshima and Yata, 2014).
Assume the following conditions:
 (AYiii)

as , for , and as .
Assume also the NSSE condition (2). Under a certain assumption milder than (Ai), it holds that as
Furthermore, for DBDA, it holds that as
(8) 
Remark 3.
By using the asymptotic normality, Aoshima and Yata (2014) proposed the misclassification rate adjusted classifier (MRAC) in highdimensional settings.
In this paper, we consider the distancebased classifier from a different point of view. We consider the classifier under the SSE model. We emphasize that highdimensional data often have the SSE model. See Table 1, Figs. 1 and 2. If the SSE condition (4) is met, one cannot claim the asymptotic normality in Theorem 2. In addition, if the SSE condition (4) is met, (AYii) in Theorem 1 is equivalent to
(9) 
Thus (AYii) in the SSE model is stricter than that in the NSSE model, For example, for the NSSE model as the spiked model in (6) with , , (AYii) is equivalent to . On the other hand, for the SSE model as (6) with (and for ), (AYii) is equivalent to . That means or should be quite large for the SSE model compared to the NSSE model. Thus if the SSE condition (4) is met, DBDA has the classification consistency (7) under strict conditions compared to the NSSE condition (2). In order to overcome the difficulties, we propose a new distancebased classifier by estimating eigenstructures for the SSE model.
3 Distancebased classifier using eigenstructures
Let
In this section, similar to Aoshima and Yata (2018), we assume the following model for :
 (Mi)

There exists a fixed integer such that are distinct in the sense that when , and and satisfy
Note that (Mi) implies the SSE condition (4), that is (Mi) is one of the SSE models. For example, (Mi) holds in the spiked model in (6) with
for . We emphasize that (Mi) is a natural model under the SSE condition (4). See Fig. 2. The six microarray data appear to be consistent with (Mi). Similar to (9), we note that the sufficient condition (AYii) in Theorem 1 is equivalent to
under (Mi). According to the arguments in the last paragraph of Section 2, if (Mi) is met, DBDA has the classification consistency (7) under strict conditions compared to the NSSE condition (2). Also, one cannot claim the asymptotic normality in Theorem 2 under (Mi). In order to overcome the difficulties, similar to Aoshima and Yata (2018), we consider a data transformation from the SSE model to the NSSE model.
3.1 Data transformation
Recall that is the th eigenvector of . Let
for . Note that for . Let us write that , , , and . Note that and for all . Thus the transformed data, , has the NSSE model in the sense that
where denotes the largest eigenvalue of any positivesemidefinite matrix, . Hence, we can say that a classifier by using the transformed data has the classification consistency (7) under mild conditions compared to DBDA when (Mi) is met. In addition, one can claim the asymptotic normality of the classifier even when the SSE condition (4) is met.
Now, we propose the classifier by using the transformed data. Let us write that , and for . We consider the following classifier:
(10) 
Then, one classifies into if and into otherwise. Let . Here, let us write that ,
for . Then, we claim that when for ,
(11) 
Remark 4.
In Sections 3.2 and 3.3, we give consistency properties and an asymptotic normality of . We assume the following conditions as necessary:
 (Ci)

as for ;
 (Cii)

as for ;
 (Ciii)

as and for ;
 (Civ)

as , for , and as ;
 (Cv)

as , ,
and as for .
3.2 Consistency of the classifier (10)
We consider consistency properties of . We note that as under (Ci) to (Ciii). See Section 6.1. Then, we have the following results.
Theorem 3.
Corollary 1.
Now, we consider the sufficient condition (Cii) in Theorem 3. When as for , it holds that