1 Introduction
We discuss the twosample tests for stochastic order of two intervalvalued samples. In the intervalvalued data, the variable of interest is not observed as a single point but is displayed in the form of an interval, with lower and upper bounds. For example, intervalvalued data is observed when stock price is reported monthly by lower and upper limit prices. In addition, blood pressure data, which motivates our research, has diastolic blood pressure (DBP) and systolic blood pressure (SBP) as lower and upper bounds. It is a fundamental problem in statistics to test the stochastic order of two populations as well as to verify the equality of the two distributions. However, little research has been done for the intervalvalued data; even definition of stochastic order for intervalvalued data is not clearly established. Thus, this paper introduces its definition and proposes a method to test the stochastic order of two samples of intervalvalued data.
The remainder of the paper is organized as follows. In section 2, we define the stochastic order of intervalvalued data. In section 3, we propose a test statistic for testing the order of intervalvalued data and derive its asymptotic null distribution using the general theory on Ustatistic. In section 4, we examine the performance of the modified twodimensional KomogorovSmirov(KS) statistic and the proposed through a numerical study. In section 5, we apply the methods to the blood pressure data from female students in the US. In section 6, we conclude the paper with a summary.
2 Simple stochastic order
Before we introduce the notion of the stochastic order for intervalvalued data, we look at the stochastic order for the usual univariate case. Let and
be two univariate random variables such that
Then, is said to be stochastically greater than (denoted by ). If additionally for some , then is said to be stochastically strictly greater than (Shaked and Shanthikumar, 2006).
The stochastic order for intervalvalued data can be defined similarly. Let and be two intervals. Then we denote and say is greater than if and . Now, let and be two random intervals such that
(1) 
Then, is said to be stochastically greater than and denoted by . Let and be the survival functions of the random intervals and , respectively. Then, (1) is equivalent to
We can illustrate the order of the intervals as follows (see Figure 1). Let the interval denoted by the point in the plane. Note that in the plane, intervalvalued data is displayed at the top of the line due to the constraint . Any intervalvalued data of the halfplane belongs to any of three cases according to the order relation with the interval .

region A: intervals are greater than .

region C: intervals are less than .

region B or D: intervals do not have an order relation with .
For the last case, an interval in region B satisfies , while an interval in region D satisfies .
3 Test statistic
Let us consider two independent samples of random intervals. Suppose that a first sample , , has a survival function and the second sample , , has a survival function . We want to verify the null hypothesis that both samples come from an identical distribution, “: for all ” against to the alternative hypothesis that is stochastically strictly greater than , i.e., “ for all and for some interval ”.
The statistic we propose to test the stochastic order is
(2) 
where
Note that under the null , and thus .
The statistic belongs to a class of Ustatistics, which allows one to derive its asymptotic null distribution based on the asymptotic theory of the Ustatistic. We introduce below a general asymptotic theory of Ustatistics reported in Chapter 6 of Lehmann (1999). Let be a symmetric kernel of () arguments. Here, the symmetric kernel denotes a function whose value does not change by changing the order of arguments or . Let defined below be a parameter of interest;
and define its Ustatistic by
(3) 
where is the collection of all subsets of with size and dummy indices running over summations are and , respectively.
is an unbiased estimator of
and its variance is
where is given by
and and are independent copies of and . The theorem below from Chapter 6 of Lehmann (1999) explains the asymptotic distribution of the Ustatistic (3) above.
Theorem 1 (Lehmann(1999), Theorem 6.1.3 (ii)).
As and ,
converges in distribution to the normal distribution with mean
and variance . Here, and are computed byApplying the general theory above for Ustatistics to our case, we can derive the asymptotic null distribution of our statistic.
Theorem 2.
Under the null hypothesis that , if as , then
where , , and .
Parameters used to compute the asymptotic variance can be approximated by permuting observations within each sample. To understand it, we observe the followings.
(4) 
where are independent random intervals from the first population. Consequently, can be approximated by
Equation (4) has an implication that the above approximation would be a valid estimate of even under the alternative hypothesis.
Proof.
For , let us define . Then, can be presented by a two sample Ustatistic when ;
Therefore, by applying Theorem 1, we have
where , , , and .
Now, let us denote interval random variables by , , and . Under the null hypothesis , we have . The variance component (= ) is evaluated as
Now, we write , , and . Thus, under , we get
Hence, the asymptotic variance of is
∎
4 Numerical study
In this section, we compare the power of our proposed test (denoted as “Utest”) to onesided bivariate KS test (denoted as “KS test”). Utest can be classified by how its null distribution is approximated. “Uperm” designates Utest where we approximate the null distribution by a permutation method, while “Uasym” is the one depending on the approximation given in Theorem
2. KS test for the alternative hypothesis is given by (Feller, 1948)where and . The null distribution of is approximated using a permutation method (Gail and Green, 1976).
In the study, to generate intervalvalued data , we consider a transformation to obtain and halfrange . We consider two underlying distributions for ; bivariate normal distribution and bivariate
distribution with the degrees of freedom
.For two populations, we consider and parameterized as follows;
For , the following four values are used : where indicates the alternative hypothesis. Figure 2 shows the graphical illustration of the simulation setting. To examine the effect of correlation between the center and range, we use three values for . The significance level is set as . The size and power are evaluated as the rejection rate among replicates. The number of permutations to generate a null distribution is set as . For the sample size , we consider following 4 cases: (30, 30), (30, 120), (50, 50), (50, 200).
Table 1 shows some interesting findings with regard to the proposed Utest. First, the power of our Utest is higher than the onesided KS test in all cases under consideration regardless of the magnitude of
. Also, it is noted that when it comes to Utest, the powers based on a permutation method and asymptotic results are almost same in all cases, which proves the asymptotic result and its accuracy. Third, the greater the correlation between center and range, the higher power of each test we can get. This phenomenon can be explained using the Mahalanobis distance between two mean vectors from the null and the alternative. The distance is
, which is increasing in terms of . Specifically, when is , , and , the corresponding distance is , and , respectively.5 Data example
In this section, we apply the stochastic order tests to a real dataset. The data we use is obtained from National Heart, Lung, and Blood Institute Growth and Health Study (NGHS), which is a year cohort study to evaluate the temporal trends of cardiovascular risk factors, such as systolic and diastolic blood pressures (SBP, DBP) based on annual visits of 2,379 AfricanAmerican and Caucasian girls. The blood pressure (BP) data, which is measured at two levels, can be an example of the MMtype intervalvalued data. In this analysis, we only use BP measurements at the first visit and remove subjects with missing values. After all, the total number of subjects is , where Caucasians and AfricanAmerican girls are and , respectively. The goal of this application is to test a hypothesis “BP of AfricanAmerican is stochastically greater than that of Caucasian girls”.
Caucasian  AfricanAmerican  pvalue  

midBP  78.67 (9.09)  80.13 (8.03)  
DBP  56.72 (12.19)  58.03 (11.72)  
SBP  100.62 (9.28)  102.23 (8.65)  
halfrange  21.95(5.89)  22.10 (6.44) 
Table 2 shows that SBP, DBP, and their center are significantly higher in AfricanAmerican than in Caucasian. Meanwhile, it is confirmed that there is no difference in the range between two groups. These results are very similar to the setting of the numerical study, where centers of two groups are similar, but ranges are different.
Now, we verify whether the BP of AfricanAmerican is stochastically greater than that of Caucasian based on intervalvalued data, instead of marginal distributions. Table 3 presents test results of previously compared methods. In all tests, the pvalues are smaller than 0.001, which ensures that the BP of AfricanAmerican is stochastically greater than that of Caucasians.
Uperm  Uasym  BKS  

pvalue 
6 Conclusion
In this paper, we introduce the notion of stochastic order between two samples of intervalvalued data and propose a test statistic based on Ustatistic. We compute the asymptotic null distribution of the proposed statistic. The numerical study shows that the asymptotic distribution approximates the null distribution with accuracy, even with small size of samples. Also, the proposed test has higher power than the onesided bivariate KS test in all cases we consider. Therefore, it can be said that the procedure proposed in this paper is of great use for testing the order of intervalvalued data.
Notes
Authors want to inform that this manuscript is an English version of the article written in Korean and accepted at The Korean Journal of Applied Statistics.
Acknowledgements
We would like to show our gratitude to two anonymous reviewers and the editor of The Korean Journal of Applied Statistics for their detailed and instructive comments.
References
 BlancoFernandez and Winker (2016) BlancoFernández, A. and Winker, P. (2016). Data generation processes and statistical management of interval data. AStA Advances in Statistical Analysis, 100(4), 475494.
 Feller (1948) Feller, W. (1948). On the KolmogorovSmirnov limit theorems for empirical distributions. The Annals of Mathematical Statistics, 19(2), 177189.
 Lehmann (1999) Lehmann, E.L. (1999). Elements of Large Sample Theory. Springer.
 Gail and Green (1976) Gail, M., and Green, S. (1976). Critical values for the onesided twosample KolmogorovSmirnov Statistic. Journal of the American Statistical Association, 71(355), 757760.
 Shaked and Shanthikumar (2006) Shaked, M. and Shanthikumar, J.G. (2006). Stochastic Orders. Springer.
Comments
There are no comments yet.