1 Introduction
There has been a wide interest to extend univariate and multivariate nonparametric procedures to clustered and hierarchical data, which frequently arise in longitudinal studies for example. It is well known that unless the clustered structure is taken into account during the analysis, the tests and confidence intervals will not maintain their prescribed levels, leading to invalid inference. Traditionally, parametric mixed models have been used to account for the correlation structures among dependent observational units. The extensions of nonparametric methods to clustered data can roughly be divided into univariate
(Rosner and Grove 1999; Rosner et al. 2003, 2006; Larocque et al. 2010; Williamson et al. 2003; Larocque 2005; Datta and Satten 2005; Werner and Brunner 2007; Datta and Satten 2008; Larocque et al. 2008; Kloke et al. 2009; Konietschke and Brunner 2009) and multivariate approaches (Larocque 2003; Larocque et al. 2007; Nevalainen et al. 2007b, a; Haataja et al. 2009). In this paper we demonstrate that the nonparametric procedures, which may first seem a sparse collection of tests and estimates, can in fact be regarded as a class of scorebased methods for clustered data problems. Strict assumptions on the distribution of the random effects or the random errors are unnecessary in this class. Our notation coincides with the one used in the mixed models framework.Let be a sample of variate (
) random vectors with sample size
. The data are assumed to be clustered throughout the paper. The cluster memberships are given by the matrix :It is useful to note that
and that is a diagonal matrix with the cluster sizes on the diagonal, say, . We also write for a column vector of ones, for the vector obtained by stacking the columns of , and for the Kronecker product.
A parametric linear mixed effects model for multivariate clustered data can be written as
(1) 
where and are design matrices corresponding to the fixed effects and cluster memberships, respectively, is a random matrix of regression coefficients (random effects), is a matrix of regression coefficients (fixed effects), and is an matrix of random errors. In a normality based model it is assumed that the rows of are i.i.d. from and that the rows of are i.i.d. from . To better illustrate the dependency structure, rewrite the model as
If this means that

for all .

If then

If then and are independent.
The problem with the parametric linear model are the strict distributional assumptions, which can be completely unrealistic. Its use outside of the assumed model can lead into inefficient or even invalid statistical inference. In section 2 of this paper we introduce an alternative semiparametric linear model to analyze clustered data under relaxed assumptions. Section 3 treats the multivariate onesample location problem, both from the point of view of testing and estimation. The section has some review character in it but we now present the results for a general score function rather than focusing on a specific score. In section 4 the treatment is expanded to the multivariate several samples location problem, which has not appeared previously. Efficiency studies demonstrate the practical advantages and disadvantages of the proposed twosample weighted nonparametric tests compared to the classical methods based on the sample mean and the sample covariance matrix in section 5. Finally, two data sets are analyzed with spatial sign and rank methods (provided as a supplemental file).
2 A Semiparametric Linear Model
Suppose that the data is clustered and we wish to analyze it by the linear model
However, we wish to avoid assumptions of normality, or any other parametric distribution, on the random errors. Instead we assume the following.
Assumption 2.1
Distributional assumptions.

The marginal distributions are identical: for all .

The clusters are independent: if then and are independent.
Assumptions (D1) and (D2) fix the location, variance and covariance structure of the model. Assumption (D3) is a natural and standard presumption, and also needed for finding the limiting distribution. Compared to the assumptions of the parametric linear model (P1)(P3), these conditions are not restrictive.
To work under these assumptions it is often advantageous to use transformed observations instead of the original ones. Our approach is first to apply a vectorvalued score function to the data points (or to the centered observations as discussed later). To prove the asymptotic results, it is sufficient that the score function has the following properties.
Assumption 2.2
Sufficient conditions on the score function.

The score function satisfies
for some , and and such that

for some .
In the onesample case, one additional condition is needed.
Note that if is the matrix of partial derivatives , then in (S1).
Tests and estimates can then be constructed on the transformed data. By choosing the score function well, tests and estimates can achieve desired properties for the problem at hand, like robustness against outliers, or improved efficiency for heavytailed distributions. Some examples of clever choices of scores are signs and ranks commonly used in nonparametric statistics, optimal scores from maximum likelihood theory, or Huber’s score often applied in robust statistics. Thus, the proposed approach works for a general score function, and suggests how the tests and estimates could be constructed. Some authors have expressed interest towards this type of approach in univariate testing
(Jin and Robinson 2003; Huang et al. 2009), but these tests are currently designed for independent observations only. The main motivation for the present paper is, however, multivariate sign and rank methods resulting from taking either the spatial signthe centered spatial rank
or the centered spatial signed rank
By convention, . Figure 1 illustrates how the transformations preserve the clustering structure. The methods based on spatial signs and ranks are more robust, more efficient for heavytailed distributions than normal theory based methods (Möttönen et al. 1997)
, and do not require assumptions on the existence of moments of
. Spatial sign and rank methods have been criticized for their lack of affine invariance and equivariance properties, but this problem can be overcome by a modified transformationretransformation procedure (section 6). The results of the paper have been written having these three score functions in mind but they hold more generally.Conditions (S1)–(S3) hold for the identity score if for some . For the spatial sign score , (S2) and (S3) are trivially true. To verify (S1), one can first show that
(Arcones 1998; Bai et al. 1990). (S1) then follows if has a bounded density. The conditions can be established similarly for the spatial rank score.
3 OneSample Case
Assume that
where rows of satisfy assumptions (D1)(D3). In the onesample case it is natural to transform the data set using an odd score function . We wish to test the null hypothesis without loss of generality, where the location parameter satisfies . Thus, its interpretation depends on the choice of the score. However, if the distribution is symmetric, , all tests test the same null hypothesis and the corresponding estimates estimate the same population parameter with different statistical properties.
Let be the density of and be the optimal score function, the gradient vector of with respect to at the origin. Note also that if is true. Define
and
The covariance structure of the weighted scores is then
where is a diagonal matrix with a nonnegative weight associated with the th observation as the th diagonal element. For the sampling design and the weights, it is assumed that and that there exist constants and such that
as tends to infinity.
But how should the weights be chosen? It is natural that the members of the same cluster receive the same weight. Furthermore, it can be shown that if and the covariance matrix has the structure
the optimal weights are given by . The weights in the th cluster are then proportional to (Larocque et al. 2007). These weights are optimal in the sense that they maximize the Pitman efficiency of a test based on . Here is the Lagrange multiplier chosen so that the constraint is satisfied.
3.1 Testing
The test statistic for testing versus is the weighted average of the scores
Then we have:
Result 3.1
Under the null hypothesis , as tends to infinity
where is a consistent estimate of , the asymptotic covariance matrix of .
The result follows from the generalization of the central limit theorem given as a corollary in
Serfling (1980, p. 30), by noting that the cluster sums are independent but not identically distributed random variables.
For symmetric distributions, small sample (meaning here a small ) values can be based on the sign change principle. Under the null,
where is a diagonal signchange matrix, with as diagonal entries, and changing all the signs within a cluster at the same time. The covariance matrix estimate is invariant under these sign changes, and the test statistic becomes
Estimation of over e.g. 1000 equiprobable random allocations of signs gives an estimated value for the conditionally distributionfree test.
Result 3.2
The limiting distribution under the sequence of alternatives is a noncentral chisquare
as .
3.2 Estimation
The companion estimate of location is determined by the estimating equation
where . Thus, the solution is a location estimate with the property that the weighted scores of shifted observations add up to zero. If (S1) holds and , the Bahadurtype representation
shows the relationship between the test and the estimate. The connection has been established in detail for the spatial sign score (Nevalainen et al. 2007b, a) with clustered data, and for the spatial signedrank score (Chaudhuri 1992) with independent observations. Now, the asymptotic distributions of the weighted spatial median or the weighted spatial HodgesLehmann estimate for example, are trivial:
Result 3.3
The limiting distribution of the scorebased estimate is
as .
The estimation of the covariance matrix can be based on the residuals . The matrix can be consistently estimated by . For the weighted spatial median and the weighted HodgesLehmann estimate the matrix is estimated by
and the second average is over the pairs with . Estimation of precision of the estimates in small samples could potentially be based on bootstrap procedures for clustered data (Field and Welsh 2007), but more practical experience is needed on this approach.
4 Several Samples Case
Suppose now that the data consist of , where and are matrices of response vectors and the cluster memberships, respectively, in the same way as in section 1, and that is an matrix indicating group or sample membership such that
Again,
and that is a diagonal matrix with the group sizes on the diagonal, say, , again assumed fixed by the design. Recall the cluster membership matrix and the weight matrix from the previous section. Now is a frequency table fixing the design.
Write the model as
where is the overall location center (e.g. grand mean), is a contrast matrix representing the treatment effects or the deviations from that location center, and the rows of satisfy (D1)(D3). The parameters in the model depend on the choice of the score, population, and the design. For now we are interested in the treatment effects only: the goal is to confront the hypotheses
In the test construction we need estimated (or centered) scores chosen to satisfy . Different scores require different inner centering to fulfil this property. For example, for the identity or the spatial sign score, the estimated scores and the theoretical scores are , where is estimated from the whole sample assuming absence of treatment effects. As will be seen in the next section, the test statistic can be expressed in two asymptotically equivalent forms, but only if the weighted estimate of location is based on the same score as the test. For these two scores, the estimate of should be the weighted sample mean (identity score) or the weighted spatial median (spatial sign score). Rank scores are automatically centered but the weighted ranks are not. Thus, in the case of ranks we write
where the expectation is taken over with . Note that and . Collect these and let and denote the matrices of estimated and theoretical scores, respectively. Under the null hypothesis, and the covariance structure is given by
Optimal weights are obtained for the twosample problem in a similar way as for the onesample problem: if again and the optimal weights are where with Lagrange multipliers and chosen such that and .
4.1 Testing
The test to confront the hypotheses and is based on the vector
where is a
matrix obtained from the identity matrix by dropping its
th column. Under the null hypothesis,where is a centered version of the design matrix, which has enabled us to replace the estimated scores by the theoretical scores .
Under the null hypothesis and correspondingly its nonsingular covariance matrix is
A necessary assumption is that the matrices
converge to finite matrixvalued limits, , and , say. Note the similarity of the requirement to the onesample case: the limits now need to exist groupwise. Diagonal elements of the matrix satisfy .
Result 4.1
Under the null hypothesis
as , where and the part inside the brackets is a consistent estimate of .
A conditionally distributionfree permutation test is constructed as follows. Let be an “acceptable” permutation matrix obtained by permuting the rows or columns of an identity matrix, uniform among acceptable permutations. The value of the permutation test is the estimate of
The proper way to permute clustered data in an acceptable way depends on the design. Clearly, permutations which do not change the distribution of the test statistic when the null hypothesis is true are guaranteed to provide a valid and distributionfree test. We say that two designs and are equivalent in structure if there exist permutation matrices and of dimensions and , respectively, such that
To ensure that the null distribution of the test statistic is invariant under permutations, the general condition on the permutation matrix is that and are equivalent in structure. In controlled trials the approach should also follow the randomization scheme. One can distinguish between three common designs:
Design A. A permutation fulfilling the general condition is natural for observational studies, where the researcher has no control over the group memberships.
Design B. Randomization of individuals inside the clusters. The permutation of the treatment assignments should be performed only within the clusters. More formally, so that the permutation matrices satisfy . The condition implies the general condition.
Design C. Randomization of clusters. The permutation should then maintain the members of the same cluster within the same treatment, and the permutation matrices should satisfy the general condition and This allows the exchange of treatments between clusters of the same size only, and the permutation may not be very rich.
Consider next the sequence of alternatives , where is chosen in such a way that . This fixes the location parameter for asymptotic studies. Under ,
Therefore we have the following result.
Result 4.2
Under , the limiting distribution of is a noncentral chisquare with degrees of freedom and noncentrality parameter
as .
Alternatively, the noncentrality parameter can be written as
which is of the same form as in the onesample case.
4.2 Estimation
Until now we have parametrizised the model with and , which depend not only on the underlying population but also on the design. Let us now reparametrize the model by , and obtain the model
The weighted estimates of the group centers are found via solving the estimating equations where now , and . In essence, this is simply a onesample estimation problem repeated times (section 3.2).
Estimation of group differences is a little more subtle issue. Due to clustering, the observations are correlated, and so are the estimates. Write , . With the identity, spatial sign and rank scores, the problem is reduced to computation of (i) the difference of the mean vectors, (ii) the difference of the spatial medians, or (iii) the twosample HodgesLehmann estimate (Hodges and Lehmann 1963; Möttönen and Oja 1995). Again, under sufficient conditions, the connection between the estimate and scores is
where is the group size. Standard theory yields:
Result 4.3
Under general assumptions, the limiting distribution of is a
variate normal distribution with expectation zero and covariance matrix
(2) 
as and where
This covariance breakdown shows how the intracluster dependency affects the covariance structure via members of the same cluster receiving the same and different treatments. If all members of the cluster belong to the same group, the last part of disappears, and the total variance can be seriously underestimated if the clustering is ignored. The opposite may happen when treatments are assigned within the clusters.
In practice, the limiting constants and can be replaced by their empirical counterparts. The estimation of the matrices , and is based on the residuals. For the spatial sign score, obvious estimates are
where . For spatial ranks one could use in which and observation belongs to sample and to sample . An alternative and simpler route is to estimate (2) by a similar estimate as in the onesample case:
This estimate uses only two samples in the estimation of the middle part and is more reliable when the variances are heterogeneous across samples.
5 Efficiency Studies
In this section we focus on the efficiency of twosample tests. For earlier efficiency studies of the multivariate onesample problem with spatial sign and rank scores we refer to Larocque (2003), Larocque et al. (2007) and Haataja et al. (2009).
We generated clustered multivariate data from a linear mixed model setting up a trivariate
distribution. Full details of the model, designs and cluster size configurations are given in the supplemental file.The performance of six twosample tests, Hotelling’s , spatial sign test, spatial rank test, and their weighted versions was investigated. The weights optimal for the classical Hotelling’s test were used for all the weighted tests. Practical experience has shown that the three optimal weight matrices of the tests are very similar, and the rationale for choosing these weights lies in their appealing ease of computation. The tests were studied under three frequently encountered designs (section 4.1) for different values of the intracluster correlation .
5.1 Asymptotic relative efficiency
Asymptotic relative efficiencies (ARE), using the unweighted Hotelling’s as the benchmark test, for the three different designs are shown in Figure 2. At , the tests inherit the efficiencies from the i.i.d. case. Hotelling’s test is the optimal for the multinormal distribution, but the spatial sign test is the best for the distribution. The ARE of the weighted Hotelling’s relative to the unweighted Hotelling’s does not depend on the degrees of freedom. The spatial rank test has a good ARE for both distributions. The behavior of the tests is remarkably different from design to another when .
In Design A, the AREs of the unweighted tests do not depend on , because
is here a zero matrix. Weighted tests behave gorgeously, however. Optimal weighting assigns large weights to observations in clusters with both groups present, and less weight to clusters with members only from one group. As
, these withincluster comparisons tend to receive all the weight, because the treatment effect can be recovered most accurately from them. The unweighted tests still suffer from the error in betweenclusters comparisons, and thus the AREs of the weighted tests are dramatically better. Design B is an example of a design where the efficiency cannot be improved by weighting. Thus, the efficiencies of the unweighted and weighted tests overlap. For both spatial sign and rank tests the AREs decrease as a function of . The AREs in Design C are similar to the ones in the onesample case (Larocque et al. 2007, e.g). AREs of the unweighted tests at and are identical since a cluster becomes a singleton observation at . In between, spatial sign and rank tests suffer less from intracluster correlation than Hotelling’s . Notable improvements can be obtained by weighting.5.2 Simulations
Results on the empirical size and power are presented in Table 1 for . All six tests generally maintain their nominal size well with a few exceptions. Hotelling’s and its weighted version are conservative for the error distribution. The distribution has very heavy tails, and in such a setting the convergence of the momentbased test to its limiting distribution is slower. All unweighted tests seem conservative in Design C. This suggests that designs without withincluster comparisons between treatments need a larger sample size for a good approximation. Interestingly, corresponding weighted tests are liberal for the same design in particular with larger values of . Of course, with the unweighted and the weighted test are almost the same. It is worth noting that at (supplemental Table 3), spatial sign and rank tests still maintain their size fairly well in Design A, regardless of the values of and , but not as generally in Designs B and C. Curiously, the direction from which the tests converge to meet their target level seems to depend on all parameters of the configuration: score, design, weights, error distribution and intracluster correlation. At the levels overall have improved substantially (supplemental Table 4).
Spatial sign and rank tests have a good power among all the studied error distributions and designs, whereas Hotelling’s is the best unweighted test in the normal case, but has almost no power for the distribution. This problem cannot be solved by selecting a different design, or by weighting. As for the AREs, use of weights enhance the power of the tests for large values of : modestly in Design A and more effectively in Design C. No gains of power can be obtained by weighting in Design B.
6 Concluding Remarks
It is commonly acknowledged that in the onesample case ignoring a positive intracluster correlation leads to too liberal analyses because of underestimation of the variance. However, our variance breakdown and the supplemental example demonstrate that the opposite can also take place in a multitreatments study. Therefore, the analysis may either be too liberal or too conservative depending on the design of the study.
The proposed scorebased procedures are not necessarily affine invariant and equivariant. Affine invariant and equivariant versions of spatial sign and rank methods can be developed by using the wellestablished transformationretransformation techniques with an inner standardization (Chakraborty and Chaudhuri 1996; Chakraborty et al. 1998; Chakraborty and Chaudhuri 1998). Larocque (2003) and Nevalainen et al. (2007b) present ideas how to do so with clustered data for the testing and estimation problems, respectively. These computationally intensive modifications of the transformationretransformation procedures can be used as general tools to achieve affine invariance/equivariance with these type of data.
This paper provided a general treatment of scorebased testing and estimation methods for multivariate location problems with clustered data. Asymptotic results were confirmed with simulation studies, which also clearly demonstrate the gains obtained by the use of scores and weights. In future research we intend to work with multilevel or hierarchical data, and on regression problems.
Acknowledgements
Constructive comments by the Associate Editor and two anonymous referees greatly improved the paper. This research was supported by the Academy of Finland, The Finnish Foundation of Cardiovascular Research and the Competitive Research Funding of the Pirkanmaa Hospital District. The research work of Denis Larocque was supported by the Natural Sciences and Engineering Research Council of Canada.
7 Supplemental Materials
 Supplemental tables:

Details for the efficiency studies and simulation results with and . (.pdf)
 Rfunctions used for the simulations:

Functions that generate data from the model of section 5, perform the tests, and collect the results into external files. (.zip)
 Examples:

Description and analysis of two example data sets. (.pdf)
 Rfunctions used for the examples:

Functions that perform the analyses of the example data sets. (.zip)
References
 Arcones (1998) Arcones, M. A. (1998), “Asymptotic theory for estimators over a convex kernel,” Econometric Theory, 14, 387–422.
 Bai et al. (1990) Bai, Z. D., Chen, R., Miao, B. Q., and Rao, C. R. (1990), “Asymptotic theory of least distances estimate in multivariate linear models,” Statistics, 4, 503–519.
 Chakraborty and Chaudhuri (1996) Chakraborty, B. and Chaudhuri, P. (1996), “On a transformation and retransformation technique for constructing affine equivariant multivariate median,” in Proceedings of the American Mathematical Society, vol. 124, pp. 2539–2547.
 Chakraborty and Chaudhuri (1998) — (1998), “On an adaptive transformation and retransformation estimate of multivariate location,” Journal of the Royal Statistical Society, Series B, 60, 145–157.
 Chakraborty et al. (1998) Chakraborty, B., Chaudhuri, P., and Oja, H. (1998), “Operating transformation retransformation on spatial median and angle test,” Statistica Sinica, 8, 767–784.
 Chaudhuri (1992) Chaudhuri, P. (1992), “Multivariate location estimation using extension of estimates through statistics type approach,” The Annals of Statistics, 20, 897–916.
 Datta and Satten (2005) Datta, S. and Satten, G. A. (2005), “Ranksum tests for clustered data,” Journal of the American Statistical Association, 100, 908–915.
 Datta and Satten (2008) — (2008), “A signedrank test for clustered data,” Biometrics, 64, 501–507.
 Field and Welsh (2007) Field, C. A. and Welsh, A. H. (2007), “Bootstrapping clustered data,” Journal of the Royal Statistical Society, Series B, 69, 369–390.
 Haataja et al. (2009) Haataja, R., Larocque, D., Nevalainen, J., and Oja, H. (2009), “A weighted multivariate signedrank test for clustercorrelated data,” Journal of Multivariate Analysis, 100, 1107–1119.
 Hodges and Lehmann (1963) Hodges, J. L. and Lehmann, E. L. (1963), “Estimates of location based on rank tests,” The Annals of Mathematical Statistics, 34, 598–611.
 Huang et al. (2009) Huang, A., Jin, R., and Robinson, J. (2009), “Robust permutation tests for two samples,” Journal of Statistical Planning and Inference, 139, 2631 –2642.
 Jin and Robinson (2003) Jin, R. and Robinson, J. (2003), “Robust permutation tests for one sample,” Journal of Statistical Planning and Inference, 116, 475–487.
 Kloke et al. (2009) Kloke, J. D., McKean, J. W., and Rashid, M. M. (2009), “Rankbased estimation and associated inferences for linear models with cluster correlated errors,” Journal of the Americal Statistical Association, 104, 384–390.
 Konietschke and Brunner (2009) Konietschke, F. and Brunner, E. (2009), “Nonparametric analysis of clustered data in diagnostic trials: estimation problems in small sample sizes,” Computational Statistics and Data Analysis, 53, 730–741.
 Larocque (2003) Larocque, D. (2003), “An affineinvariant multivariate sign test for cluster correlated data,” The Canadian Journal of Statistics, 31, 437–455.
 Larocque (2005) — (2005), “The Wilcoxon signedrank test for cluster correlated data,” in Statistical Modeling and Analysis for Complex Data Problems, eds. Duchesne, P. and Rémillard, B., New York, USA: Springer, pp. 309–323.
 Larocque et al. (2010) Larocque, D., Haataja, R., Nevalainen, J., and Oja, H. (in press), “Two sample tests for the nonparametric BehrensFisher problem with clustered data,” Journal of Nonparametric Statistics. First published online on: 26 January 2010.
 Larocque et al. (2007) Larocque, D., Nevalainen, J., and Oja, H. (2007), “A weighted multivariate sign test for cluster correlated data,” Biometrika, 94, 267–283.
 Larocque et al. (2008) — (2008), “Onesample location tests for multilevel data,” Journal of Statistical Planning and Inference, 138, 2469–2482.
 Möttönen and Oja (1995) Möttönen, J. and Oja, H. (1995), “Multivariate spatial sign and rank methods,” Journal of Nonparametric Statistics, 5, 201–213.
 Möttönen et al. (1997) Möttönen, J., Oja, H., and Tienari, J. (1997), “On the efficiency of multivariate spatial sign and rank tests,” The Annals of Statistics, 25, 542–552.
 Nevalainen et al. (2007a) Nevalainen, J., Larocque, D., and Oja, H. (2007a), “On the multivariate spatial median for clustered data,” The Canadian Journal of Statistics, 35, 215–283.
 Nevalainen et al. (2007b) — (2007b), “A weighted spatial median for clustered data,” Statistical Methods & Applications, 15, 355–379.
 Rosner et al. (2003) Rosner, B., Glynn, R. J., and Lee, M.L. T. (2003), “Incorporation of clustering effects for the Wilcoxon rank sum test: a large sample approach,” Biometrics, 59, 1089–1098.
 Rosner et al. (2006) — (2006), “The Wilcoxon Signed Rank Test for Paired Comparisons of Clustered Data,” Biometrics, 62, 185–192.
 Rosner and Grove (1999) Rosner, B. and Grove, D. (1999), “Use of the MannWhitney test for clustered data,” Statistics in Medicine, 18, 1387–1400.
 Serfling (1980) Serfling, R. J. (1980), Approximation Theorems of Mathematical Statistics, New York: Wiley.
 Werner and Brunner (2007) Werner, C. and Brunner, E. (2007), “Rank methods for the analysis of clustered data in diagnostic trials,” Computational Statistics & Data Analysis, 51, 5041–5054.
 Williamson et al. (2003) Williamson, J. M., Datta, S., and Satten, G. A. (2003), “Marginal analyses of clustered data when cluster size is informative,” Biometrics, 59, 36–42.
Comments
There are no comments yet.