1 Introduction
The Behrens–Fisher problem is one of the most wellknown hypothesis testing problems that has been extensively studied by many statisticians, partly due to its simple form and numerous real applications. The univariate Behrens–Fisher problem can be phrased as follows: Let and be two independent random samples with and , where all the four parameters are unknown, and the target is to test versus .
There was enormous research on designing a procedure to test this hypothesis, for example Fisher’s fiducial inference (Fisher, 1935), Scheffé’s distribution method (Scheffé, 1943), the generalized value method (Tsui & Weerahandi, 1989), the marginal inferential models (Martin & Liu, 2015), and many others that were summarized in review articles such as Scheffé (1970) and Kim & Cohen (1998). Among all these approaches, the most broadlyadopted test statistic is the Behrens–Fisher statistic. Using conventional notations, and are the two sample means, and
are the two unbiased sample variances, and then the Behrens–Fisher statistic is defined by
. It is well known that the sampling distribution of under depends on the unknown variance ratio, and various methods were proposed to approximate this null distribution, for example the most widelyused WelchSatterthwaite approximate degrees of freedom
(Satterthwaite, 1946; Welch, 1947).Despite their extreme popularity in applications, one critical issue of the approximation methods is that they do not guarantee the control of Type I error. Therefore, conservative test procedures that can strongly control the Type I error are also of interest. The remarkable works Hsu (1938) and Mickey & Brown (1966) showed that the distribution function of is bounded below by , the distribution with degrees of freedom, and bounded above by . With this result, one can use critical values or values based on to test the hypothesis, which ensures the limit of Type I error. This approach also motivated works such as Hayter (2013) and Martin & Liu (2015).
The Behrens–Fisher problem was also generalized to the multivariate case in various research articles. In this setting, each observation follows a multivariate normal distribution, and the target is to test the equality of the two mean vectors. In the multivariate case, most of the approaches are based on the approximate degrees of freedom framework, for example
Yao (1965), Johansen (1980), Nel & Van der Merwe (1986), and Krishnamoorthy & Yu (2004). Also see Christensen & Rencher (1997) for a comparison of other solutions.Alternatively, along the direction of Hsu (1938) and Mickey & Brown (1966), Eaton & Olshen (1972) attempted to develop stochastic bounds for the test statistic in the multivariate case, and they provided the result for the twodimensional case with proportional covariances assumption. However, the theorem that they developed to prove the result had the restriction that it only applied to the twodimensional case, so they left the general finite dimensional case as a conjecture.
In this article, we study the same problem as in Eaton & Olshen (1972) using a related but different approach, and we are able to confirm this conjecture and generalize their result to the arbitrary finite dimensional case. As a result, we provide sharp bounds for the multivariate Behrens–Fisher distribution with proportional covariances, as a direct generalization of Hsu’s result in the univariate case.
The remaining part of this article is organized as follows. In Section 2 we briefly introduce the multivariate Behrens–Fisher problem and review some existing results on it. Section 3 is the main part of this article, where two major theorems that describe the stochastic bounds for the test statistic are provided. In Section 4 we use numerical simulations to illustrate the performance of the proposed test compared with other approximation methods. And finally in Section 5, some discussions and the conclusion of this article are provided. The proofs of two important lemmas are in the appendix.
2 Multivariate Behrens–Fisher Problem
In this section we briefly describe the multivariate Behrens–Fisher problem and review some relevant results on it. Similar to the univariate case, let and be two independent random samples, with each observation following a dimensional multivariate normal distribution: , and . The problem of interest is to test versus , with all the distributional parameters unknown. Following the same assumption in Eaton & Olshen (1972), we assume that and have proportional covariances, i.e.,
(1) 
for some unknown positive definite matrix and an unknown constant (). In the remaining part of this article we assume that .
Let and be the sample means, and and be the sample covariance matrices. It is well known that , and , where stands for a Wishart distribution with parameter and degrees of freedom. All these four random vectors and matrices are independent of each other. Furthermore, the multivariate Behrens–Fisher test statistic is defined as
(2) 
and the sampling distribution of under is typically called the multivariate Behrens–Fisher distribution. In this article, our primary goal is to derive stochastic bounds for that are free of the unknown parameters.
A major progress on this direction was made by Eaton & Olshen (1972). They first showed that under ,
(3) 
where means and have the same distribution, , is the identity matrix, and , and are independent. Then they proved that for ,
(4) 
where stands for the stochastic ordering, is any integer satisfying , and stands for a random matrix that is independent of .
However, in Eaton & Olshen (1972), (4) was only proved for the case of , since the underlying theory did not generalize to higher dimensions. To overcome this difficulty, in this article we use a different set of techniques to prove that (4) also holds for . The main results are presented in Section 3.
3 Main Results
We first present two lemmas that are the keys to our main theorems, whose proofs are given in the appendix. Lemma 1 studies the property of a linear combination of random variables, where .
Lemma 1.
Assume that are independent random variables. Let denote the distribution function of the random variable , and define its partial derivatives as and . Then for , we have

,

, and

If then .
Lemma 1 itself gives some general properties of the distribution family represented by , and in this article the lemma is mainly used to show the conclusion below, which is the central technical tool to prove our main theorems.
Lemma 2.
Assume that is a random vector. Fix and let and be two positive definite matrices. Define , and then .
To present the main theorems of this article, we first introduce two useful concepts: the majorization of vectors (Olkin & Marshall, 2016), and the exchangeability of a sequence of random vectors.
Definition 1.
Let and be two vectors in , and let and be the decreasing rearrangement of and respectively. is said to be majorized by , denoted by , if
Intuitively, indicates that and have the same total quantity, but is more “spread out”, or less “equally allocated” than .
Definition 2.
A sequence of random vectors is said to be exchangeable, if for any permutation of , .
With these notations, the first main result of this article is summarized in Theorem 1.
Theorem 1.
Let be an exchangeable sequence of positive definite random matrices of size , and let be a random vector that is independent of . If and are two sequences of nonnegative constants such that , then
Proof.
Let denote the space of all positive definite matrices. Fix , and define the function , with each . We are going to show that is convex, i.e., given , , and any constant , satisfies
(5) 
To verify this, let and , so , where is defined in Lemma 2. It follows from Lemma 2 that is concave, implying . Since and , (5) holds immediately.
Moreover, is continuous and exchangeable on its arguments, so it satisfies the condition of Theorem 2.4 of Eaton & Olshen (1972). As a consequence of the theorem, it follows that
where the expectation is taken on the joint distribution of
. It is easy to see that , so we obtain for any , which concludes the proof. ∎Theorem 1 does not put any specific distributional assumptions on , so it is more general than the Behrens–Fisher problem setting where ’s follow Wishart distributions. Applying Theorem 1 to the multivariate Behrens–Fisher test statistic in (2), we obtain the following result:
Theorem 2.
Proof.
We first show that equation (3) holds. Under , . Let , and is the symmetric square root of , then . It follows that , , and . If we let , then (3) can be obtained immediately.
Now let , , be a vector that contains elements of and elements of , and be a vector that contains elements of and other elements equal to zero, then it is easy to verify that . According to Theorem 1, we have
It is obvious that , , and . Combining with the fact that (Rao, 1973), where and , (6) is confirmed. ∎
Using the inequality in Theorem 2, a value of the test can be computed as
(7) 
and it is guaranteed that under , .
4 Simulation Study
In this section we conduct simulation experiments to compare the testing procedure using (7) with other existing methods, including Yao (1965), Johansen (1980), Nel & Van der Merwe (1986), and Krishnamoorthy & Yu (2004), in terms of their Type I errors. The experiment setting is as follows. We fix the number of variables , and assume that without loss of generality. Two groups of sample sizes are considered: the “small sample” group, with and ; and the “large sample” group, with and . The true is a realization of the distribution, and its value is fixed during the experiment. Five different values of are considered, , for each combination of and . Then for each parameter setting of , the data are randomly sampled 100,000 times to compute the empirical Type I error for each method.
Figure 1 illustrates the results for significance level . The first four methods correspond to the the existing solutions, and the “FBound” method is the one based on (7). As can be seen from the last three columns of the plot matrix, which correspond to the “large sample” case, all five solutions perform reasonably well. However, when the sample sizes are small, as in the first three columns of the plot matrix, the existing methods tend to exaggerate the Type I error a lot, and even double the prespecified significance level in some situations. On the contrary, even if the FBound method is conservative in worst cases, it always guarantees the control of Type I error.
This phenomenon is even more clear under the situation, as is shown in Figure 2. Under some circumstances the existing methods inflate the Type I error more than four times, which may cause unreliable conclusions in real applications. Same as the previous case, the FBound method is always valid despite its conservativeness.
To summarize, the simulation study indicates that the theoretical result obtained in this article is useful to derive a testing procedure for the multivariate Behrens–Fisher problem that guarantees the Type I error control, which is crucial for many scientific studies.
5 Discussion and Conclusion
In this article we have revisited the multivariate Behrens–Fisher problem with the proportional covariances assumption, and have derived finitesample lower and upper bounds for the null distribution of the test statistic. This result extends the previous work by Hsu (1938) for the univariate case and Eaton & Olshen (1972) for the twodimensional case, and can be used to create a testing procedure that strongly controls the Type I error for the multivariate Behrens–Fisher problem.
It is true that the proportional covariances assumption (1) is a moderately strong restriction, and one may hope to verify the result for the most general forms of and . In this article, this assumption is made based on the following two considerations. First, the original motivation of this article was to generalize Theorem 3.1 of Eaton & Olshen (1972), about the stochastic ordering of a series of random quotients, from twodimension to any finite dimension. However, the test statistic for the most general Behrens–Fisher problem does not belong to this type of random quotient. Second, the technical difficulty of the general case is expected to be formidable. As can be seen from Lemma 2, there exists some concavity property for the proportional covariances case, which greatly helps proving the bounds. However, many examples can be given to show that such properties are totally destroyed in the general case, so some more advanced techniques need to be developed in order to fully solve the general situation. We leave this possibility for future research.
Appendix A Appendix
a.1 Proof of Lemma 1
Proof.
Since are exchangeable in , we will prove the case for without loss of generality. Define the random variable with the distribution function , and let and denote the density function and distribution function of , respectively, then
(8) 
Moreover, using the fact that , we have
Let be the density function of , and then has the distribution function . Taking the partial derivatives with respect to on both sides, we have and , which prove the statements a) and b).
Now let as in (8), and fix with . With change of variables followed by , we obtain
Similarly, by switching the order of and and with another change of variable , it follows that
Therefore,
(9) 
Now for , let and , and then by symmetry we have . Hence as a consequence of (9), we finally get
whenever , which concludes the proof of c). ∎
a.2 Proof of Lemma 2
Proof.
For simplicity we omit the parameters and in when no confusion is caused. Let be a matrixvalued function dependent on , and assume its eigen decomposition is , where
contains the sorted eigenvalues
, andare the associated eigenvectors. Again we will omit the
arguments in the relevant quantities above whenever appropriate.Since , we have . The second identity holds since and thus . Therefore, using the notations in Lemma 1, we have where . As a result,
(10) 
where and are also defined in Lemma 1.
Theorem 9 and Theorem 10 of Lancaster (1964) provide explicit expressions for , where the former assumes ’s are distinct while the latter considers multiplicity of eigenvalues. For now we shall assume that ’s are all distinct for brevity of the proof. The same technique applies to the more general case.
Let be the th derivative of with respect to , then clearly and where
is the zero matrix. Also define
, then according to Theorem 9 of Lancaster (1964),Now consider the cumulative sum of eigenvalues from the bottom, defined as , whose second derivative is given by
(11) 
For , , so the second term in (11) is zero. For the first term, since and hence , we conclude that .
References
 Christensen & Rencher (1997) Christensen, W. F. & Rencher, A. C. (1997). A comparison of type i error rates and power levels for seven solutions to the multivariate behrensfisher problem. Communications in StatisticsSimulation and Computation 26, 1251–1273.
 Eaton & Olshen (1972) Eaton, M. L. & Olshen, R. A. (1972). Random quotients and the behrensfisher problem. The Annals of Mathematical Statistics 43, 1852–1860.
 Fisher (1935) Fisher, R. A. (1935). The fiducial argument in statistical inference. Annals of Human Genetics 6, 391–398.
 Hayter (2013) Hayter, A. (2013). A new procedure for the behrens–fisher problem that guarantees confidence levels. Journal of Statistical Theory and Practice 7, 515–536.

Hsu (1938)
Hsu, P. (1938).
Contribution to the theory of "student’s" ttest as applied to the problem of two samples.
Statistical Research Memoirs 2, 1–24. 
Johansen (1980)
Johansen, S. (1980).
The welchjames approximation to the distribution of the residual sum of squares in a weighted linear regression.
Biometrika 67, 85–92.  Kim & Cohen (1998) Kim, S.H. & Cohen, A. S. (1998). On the behrensfisher problem: a review. Journal of Educational and Behavioral Statistics 23, 356–377.

Krishnamoorthy & Yu (2004)
Krishnamoorthy, K. & Yu, J. (2004).
Modified nel and van der merwe test for the multivariate
behrens–fisher problem.
Statistics & probability letters
66, 161–169.  Lancaster (1964) Lancaster, P. (1964). On eigenvalues of matrices dependent on a parameter. Numerische Mathematik 6, 377–387.
 Magnus (1985) Magnus, J. R. (1985). On differentiating eigenvalues and eigenvectors. Econometric Theory 1, 179–191.
 Martin & Liu (2015) Martin, R. & Liu, C. (2015). Marginal inferential models: Priorfree probabilistic inference on interest parameters. Journal of the American Statistical Association 110, 1621–1631.
 Mickey & Brown (1966) Mickey, M. R. & Brown, M. B. (1966). Bounds on the distribution functions of the behrensfisher statistic. The Annals of Mathematical Statistics 37, 639–642.
 Nel & Van der Merwe (1986) Nel, D. & Van der Merwe, C. (1986). A solution to the multivariate behrensfisher problem. Communications in StatisticsTheory and Methods 15, 3719–3735.
 Nel et al. (1990) Nel, D. d., van der Merwe, C. A. & Moser, B. (1990). The exact distributions of the univariate and multivariate behrensfisher statistics with a comparison of several solutions in the univariate case. Communications in StatisticsTheory and Methods 19, 279–298.
 Olkin & Marshall (2016) Olkin, I. & Marshall, A. W. (2016). Inequalities: Theory of majorization and its applications, vol. 143. Academic press.
 Pan et al. (2013) Pan, X., Xu, M., Hu, T. et al. (2013). Some inequalities of linear combinations of independent random variables: Ii. Bernoulli 19, 1776–1789.
 Rao (1973) Rao, C. R. (1973). Linear statistical inference and its applications, vol. 2. Wiley New York.
 Ruben (2002) Ruben, H. (2002). A simple conservative and robust solution of the behrensfisher problem. Sankhyā: The Indian Journal of Statistics, Series A 64, 139–155.

Satterthwaite (1946)
Satterthwaite, F. E. (1946).
An approximate distribution of estimates of variance components.
Biometrics bulletin 2, 110–114.  Scheffé (1943) Scheffé, H. (1943). On solutions of the behrensfisher problem, based on the tdistribution. The Annals of Mathematical Statistics 14, 35–44.
 Scheffé (1970) Scheffé, H. (1970). Practical solutions of the behrensfisher problem. Journal of the American Statistical Association 65, 1501–1508.
 Tsui & Weerahandi (1989) Tsui, K.W. & Weerahandi, S. (1989). Generalized pvalues in significance testing of hypotheses in the presence of nuisance parameters. Journal of the American Statistical Association 84, 602–607.
 Welch (1947) Welch, B. L. (1947). The generalization ofstudent’s’ problem when several different population variances are involved. Biometrika 34, 28–35.
 Yao (1965) Yao, Y. (1965). An approximate degrees of freedom solution to the multivariate behrens fisher problem. Biometrika 52, 139–147.