1 Introduction
New crop varieties are usually evaluated for their performance in a target population of environments (TPE), where environments correspond to locations in specific years. This evaluation requires conducting randomized field trials at several environments sampled from the TPE. Such trials are called multienvironment trials (MET). Analysis of MET is routinely performed using linear mixed models comprising effects for genotypes, environments and their interaction (Isik et al. (2017)).
If the TPE is large and can be suitably stratified along geographical borders or agroecological zonations, it may be advantageous to subdivide the TPE into subregions. If the same set of genotypes is tested at a number of locations in each of the subregions, a linear mixed model may be fitted with random genotypewithinsubregion effects that allows estimating a genotype’s average performance in each subregion using best linear unbiased prediction (BLUP) (
Atlin et al. (2000); Piepho and Möhring (2005)). If a covariance is assumed between a genotype’s performance in different subregions, the model allows borrowing strength across subregions, meaning that estimates of mean performance in a subregion becomes more accurate than when based on data from the subregion alone (Kleinknecht et al. (2013)).Whereas analysis of subdivided TPE data has received some attention in the recent past, to the best of our knowledge the design of MET when a subdivision is envisaged has not been considered. The design of such trials has gained interest recently in endeavours to integrate trial networks across country borders (e.g., Horizon 2020 project INVITE = INnovations in plant VarIety Testing in Europe).
The design of MET for a subdivided TPE involves two decisions: (1) The total number of environments at which to conduct the trials and (2) the allocation of this total number of environments to the different subregions. This paper is devoted to the second decision.
2 Model Specification and Prediction
In this work we use a linear mixed model (LMM) in which observation of genotype in location within the th subregion is given by
(1) 
for , , and , where , , denotes the number of genotypes, is the number of subregions, is the number of locations within the th subregion and is the total number of locations. Moreover, denotes the mean (fixed) effect of the th subregion, is the interaction effect of genotype in subregion , is the effect of the th location within the th subregion, denotes the effect of the th genotype in the th location within the th subregion, is the effect of the th replication in location in subregion and denotes the observational error. All random effects and observational errors are assumed to have zero mean. The variances are given by , , and and the covariance matrix of the genotype effects is , where is some positive definite matrix.
Our main focus is the prediction of the genotype effects and the pairwise linear contrasts , , . For given total number of locations we search for the numbers of locations within the subregions, which are optimal for the prediction. Optimal designs for the estimation of fixed effects in LMM are well discussed in the literature (see e.g. Fedorov and Jones (2005) or Entholzner et al. (2005)). Less has been done for the prediction of random effects: the most general case  hierarchical random coefficient regression models  has been considered by Prus and Schwabe (2016). However, due to its more complicated covariance structure, model (1) is not a particular case of those models. Therefore, the proposed approach cannot be used here. Also in the recently published work of Prus (2019) a simpler covariance structure has been assumed.
We measure the performance of the prediction in terms of the mean squared error (MSE) matrix. The MSE matrices of the BLUPs for the genotype effects and for the pairwise linear contrasts are provided by the next lemma.
Lemma 1.
a) The MSE matrix of the BLUP of is given by
(2) 
where ,
is the vector of length
with all entries equal to , is the identity matrix and denotes the Kronecker product.b) The MSE matrix of the BLUP of is given by
(3) 
The proof of the lemma is deferred to the appendix.
3 Optimal Design
For the present optimization problem, we define (exact) designs as follows:
(4) 
where denote the subregions.
For analytical purposes we also introduce approximate designs (see e. g. Kiefer (1974)):
(5) 
where is the weight of locations within subregion . For these designs the requirement of integer values of is dropped and only the conditions and have to be satisfied.
We define the information matrix as
(6) 
and note that for exact designs the following condition is satisfied:
(7) 
Then we extend the definitions of MSE matrices (2) and (3) with respect to approximate designs and obtain
(8) 
and
(9) 
3.1 Aoptimal designs
The Acriterion for prediction may be defined as the trace of the MSE matrix (see e. g. Prus and Schwabe (2016)). For approximate designs this definition can be generalized using the extended MSE matrices (8) and (9). Then we evaluate (neglecting the constant factor ) for the pairwise linear contrasts the criterion
(10) 
where .
The Acriterion for the genotype effects differs from (10) only by the constant term and the multiplicator , both of which have no influence on the solution to the optimization. Therefore, optimal designs for the prediction of the genotype effects and the linear contrasts are the same. The next theorem provides the optimality condition for approximate designs.
Theorem 1.
An approximate design is Aoptimal for the prediction of the genotype effects and the pairwise linear contrasts iff
(11) 
where is the vector of length with the th entry equal to and all other entries equal to .
For all with equality holds in (11).
Proof.
The Acriterion (10) may be recognized as a particular Bayesian Acriterion. The optimality condition follows from Theorem 1 in Gladitz and Pilz (1982)
for the linear transformation matrix
, the regression functions and the design region . ∎Corollary 1.
Let be an Aoptimal design for the prediction of the genotype effects and the pairwise linear contrasts . Let and be support points of ( and ). Then the following equality holds:
(12) 
Note that designs for which some weights (and consequently some numbers of locations per subregion) have zero value are also acceptable in our research.
Example 1: Compound symmetry model. We consider a (compound symmetry) model with the particular covariance structure of genotype effects with positive and real valued (for which is positive definite). For this model some optimal designs can be obtained explicitly.
Theorem 2.
In the compound symmetry model the (balanced) design with , , is Aoptimal for the prediction of the genotype effects and the pairwise linear contrasts.
Proof.
For the balanced design the information matrix is given by . Then it can be easily verified that all diagonal entries of the matrix are the same, which leads to equalities in (11) for all . ∎
3.2 Optimal designs with respect to weighted Acriterion
In this section we focus on the prediction of the pairwise contrasts . We define the weighted Acriterion as the weighted sum across all subregions of the variances of the differences between the predicted and the real contrasts:
(13) 
where denote coefficients, which are related to the subregions. One possible choice is the size of the subregions. Alternatively, equal weight may be given to each subregion, meaning that for all . (In this case the weighted Acriterion coincides with the standard one). Then we extend this definition with respect to approximate designs and obtain (neglecting the constant multiplicator ) the following criterion:
(14) 
where . The next theorem presents the optimality condition for approximate designs.
Theorem 3.
A design is optimal for the prediction of pairwise linear contrasts with respect to the weighted Acriterion iff
(15) 
For all with equality holds in (3).
Proof.
Corollary 2.
Let be an optimal design with respect to the weighted Acriterion for the prediction of the genotype effects and the pairwise linear contrasts . Let and be support points of ( and ). Then the following equality holds:
(16) 
For the weighted Acriterion the optimal designs are not as easy to guess as for the standard Acriterion. Specifically, it is worth mentioning that the design with , , which intuitively could be a solution of the optimization problem, is in general not optimal (see the real data example below).
3.3 Enforcing the same efficiency in each subregion
For some studies it is required that the variances of the differences between the real and the predicted effects are the same for all subregions. For the model under investigation (model (1)) this condition is given by
(17) 
Under this condition the numbers of locations can be obtained as a solution of a system of equations (for example fix and ) with unknown variables () and no further optimization is needed.
Remark 1.
For the compound symmetry model the balanced design is a solution of (17).
Remark 2.
Not all Aoptimal designs for the prediction of the genotype effects and the pairwise linear contrasts satisfy condition (17).
Theorem 4.
Let the covariance matrix of random effects be diagonal. Let be a design satisfying condition (17). Then is Aoptimal for the prediction of the genotype effects and the pairwise linear contrasts.
Proof.
Corollary 3.
Let the covariance matrix of random effects be diagonal. Let be an Aoptimal design for the prediction of the genotype effects and the pairwise linear contrasts with for all . Then satisfies condition (17).
4 Real Data Example
We here consider variance components from a study on maize variety trials in India with five agroecological subregions. The dataset comprises four maturity groups of maize. Here, we consider only the extraearly maturity group (Kleinknecht et al. (2013), Tables 6 and 7). Based on the variance components reported in the paper, we derived variance components to be used in our design problems.
In Table 1 we summarize the variance components in the model under investigation (Model (1)) and how we determined these from the parameter estimates for the model considered by Kleinknecht et al. (2013). According to this table the adjusted covariance matrix form formula (10) may be computed as
(18) 
Effect  Model (1)  Model in Kleinknecht et al. (2013)  Variance in Model (1)  Variance in Kleinknecht et al. (2013) 
Zone + mean  fixed  426+107+153  
Genotypezone  for  
Locationzone  1129+1000  
Genloczone
+ Obs errors 
160+333 
We consider both the standard and the weighted Acriterion for firstorder factoranalytic (FA) and compound symmetry (CS) variancecovariance structures, which were discussed in Kleinknecht et al. (2013) (see also Piepho (1997) for FA models).
4.1 Standard Acriterion
For the firstorder factoranalytic model we take the covariance matrix from Table 6 in Kleinknecht et al. (2013) (right part):
(19) 
Table 2 summarizes results for optimal designs in the firstorder factoranalytic model. As we can see in Table 2, optimal designs depend on both the total number of locations and the error variance .
Approximate design  Exact design  
20  50  0.33  0.13  0.18  0.31  0.04  7  3  3  6  1 
200  0.31  0.15  0.19  0.29  0.06  6  3  4  6  1  
400  0.29  0.16  0.20  0.27  0.09  6  3  4  5  2  
40  50  0.27  0.17  0.20  0.25  0.10  11  7  8  10  4 
200  0.26  0.18  0.20  0.24  0.12  10  7  8  10  5  
400  0.25  0.19  0.21  0.23  0.13  10  8  8  9  5  
100  50  0.23  0.19  0.21  0.22  0.15  23  19  21  22  15 
200  0.23  0.19  0.21  0.21  0.16  23  19  21  21  16  
400  0.22  0.20  0.21  0.21  0.17  22  20  20  21  17 
4.2 Weighted Acriterion
For the weighted Acriterion we used the coefficients , , , , , which correspond to the areas of the subregions, respectively, as determined from a digitized version of the map shown in Kleinknecht et al. (2013).
Table 3 summarizes the results for optimal designs in the factoranalytic model.
Approximate design  Exact design  
20  50  0.35  0.03  0.10  0.37  0.15  7  1  2  7  3 
200  0.33  0.05  0.11  0.35  0.16  7  1  2  7  3  
400  0.30  0.08  0.13  0.32  0.18  6  2  2  6  4  
40  50  0.28  0.09  0.13  0.30  0.19  11  4  5  12  8 
200  0.27  0.10  0.14  0.29  0.20  11  4  6  11  8  
400  0.27  0.10  0.14  0.29  0.20  10  5  6  11  8  
100  50  0.24  0.13  0.15  0.26  0.22  24  13  15  26  22 
200  0.24  0.13  0.15  0.25  0.22  24  13  15  25  23  
400  0.23  0.14  0.16  0.25  0.23  23  14  15  25  23 
The optimal designs in the compound symmetry model are presented in Table 4.
Approximate design  Exact design  
20  50  0.22  0.10  0.12  0.26  0.30  4  2  3  5  6 
200  0.21  0.11  0.12  0.26  0.30  4  2  3  5  6  
400  0.21  0.11  0.13  0.26  0.29  4  2  3  5  6  
40  50  0.21  0.12  0.13  0.25  0.29  9  5  5  10  11 
200  0.21  0.12  0.13  0.25  0.28  9  5  5  10  11  
400  0.21  0.13  0.14  0.25  0.28  9  5  5  10  11  
100  50  0.21  0.13  0.14  0.24  0.27  21  13  15  24  27 
200  0.21  0.14  0.15  0.24  0.27  21  13  15  24  27  
400  0.21  0.14  0.15  0.24  0.26  21  14  15  24  26 
Note that optimal designs in Tables 3 and 4 depend on the total number of locations and the error variance and are in general not equal to the ratios , . The results for the factoranalytic and compound symmetry models are different, illustrating that the optimal designs depend on the variancecovariance structure of genotypic effects within subregions. In case of compound symmetry optimal designs are less sensitive to and than for the factoranalytic model.
All computations were performed using the procedures od.SOCP and od.MISOCP from the package OptimalDesign in R for optimal approximate and exact designs, respectively, as proposed in Harman and Prus (2018). Note that the exact designs obtained using od.MISOCP are optimal in the class of exact designs for the model under investigation for the given data.
4.3 Enforcing the same efficiency in each subregion
When using the CS structure (20) in , we obtained the trivial solution , . When using the factoranalytic structure in (19), the solutions were as shown in Table 5. The exact designs were obtained by efficient rounding (see Pukelsheim and Rieder (1992)).
Approximate design  Exact design  

20  50  0.342  0.148  0.205  0.302  0.003  6  3  4  6  1 
200  0.320  0.158  0.209  0.284  0.029  6  3  4  6  1  
400  0.291  0.172  0.211  0.260  0.065  6  3  4  5  2  
40  50  0.274  0.179  0.212  0.247  0.088  11  7  8  10  4 
200  0.262  0.183  0.212  0.239  0.104  10  7  9  10  4  
400  0.247  0.189  0.211  0.228  0.125  10  8  8  9  5  
100  50  0.231  0.194  0.209  0.217  0.150  23  19  21  22  15 
200  0.226  0.195  0.208  0.214  0.157  22  20  21  21  16  
400  0.219  0.197  0.206  0.210  0.167  22  20  20  21  17 
We used the function nlphqn in SAS/IML to solve the nonlinear system of equations in (17).
5 Discussion
In crop research, the design of experiments is mainly considered in the context of a single environment and assuming that treatment effects are fixed (John and Williams (1995); Mead et al. (2012)). Recently, there has been an increased interest in design for experiments when treatments are modeled as correlated random effects using kinship or pedigree information (Bueno Filho and Gilmour (2003), Bueno Filho and Gilmour (2007); Cullis et al. (2006); Butler et al. (2014)). Also, the design of multienvironment trials has been considered in a few papers, most notably in the context of partially replicated (prep) trials (Williams et al. (2014)), but also in broader contexts (GonzálezBarrios et al. (2019)). To the best of our knowledge, however, the problem of allocation of location numbers in subdivided TPE has never been considered in any detail. The problem is reminiscent of optimal allocation in stagewise sampling based on a nested randomeffects model (Snedecor and Cochran (1967), p. 529) but the approach needed is more complex due to the linear mixed model involving several fixed and random effects and the optimization being targeted to the prediction of random effects. There is also some relation to smallarea estimation in surveys where mixed models are used for estimation (Jiang and Lahiri (2006), Torabi and Jiang (2020)), but design in that context is not usually targeted at individual domains or small areas, and there is no notion of a larger number of treatments as in MET.
Our main focus was the optimal allocation of locations for different subregions with respect to the estimation of genotype effects and pairwise linear contrasts for A and particular linear (weighted A) criteria. The proposed approach is based on the method of best linear unbiased prediction (BLUP). For our problem Bayesian optimal designs for a transformed covariance matrix of genotype effects turn out to be optimal. In the example we considered two kinds of models with respect to the covariance structure: firstorder factoranalytic and compound symmetry. The resulting designs in both cases depend on the covariance structure, observational errors variance and the total number of locations in all subregions. The only exception is the standard Acriterion for compound symmetry: in this case balanced designs are optimal.
Our criterion integrates the efficiencies for BLUPs of interest across subregions. There are three variations to this approach. Two of them take a weighted or unweighted average across subregions, and optimization typically leads to allocations that imply unequal efficiency between subregions. The third approach imposes the additional restriction that efficiency be the same for each subregion. We think this latter approach is particularly relevant when several administrative entities (federal states or countries) join forces to link up their trialling networks for crossboundary analysis. For such efforts to be successful it is vital that the benefit, in terms of efficiency gain compared to independent analysis, can be split equally between the administrative entities involved.
Appendix A Proof of Lemma 1
To make use of the theoretical results that are available in the literature (see e. g. Henderson (1975)) for the prediction of random parameters we will represent the model (1) as a particular case of the general LMM
(21) 
with design matrices and for the fixed effects and the random effects, respectively. In (21), denotes the fixed effects and are the random effects. The random effects and the observational errors are assumed to have zero mean and to be all uncorrelated with positive definite covariance matrices and , respectively. Random effects and observational errors are assumed to be uncorrelated.
The latter equation may be alternatively written as
(22) 
where . Model (22) is of form (21) with , , and
According to Henderson (1975) the MSE matrix of the BLUP of the random effects (which corresponds to in our model (22)) is given by
(23) 
where denotes a generalized inverse of . With this formula we obtain MSE matrix (2). Then using the relation between the genotype effects and the pairwise contrasts we receive formula (3).
Acknowledgment
This research was partially supported by grant SCHW 531/16 of the German Research Foundation (DFG). The authors are grateful to Waqas Malik (University of Hohenheim) for determining the areas of the five breeding zones for maize in India based on a digitized map.
References
 Atlin et al. (2000) Atlin, G., Baker, R. J., McRae, K. B., and Lu., X. (2000). Selection response in subdivided target regions. Crop Science, 40, 7–13.
 Bueno Filho and Gilmour (2003) Bueno Filho, J. S. D. S. and Gilmour, S. (2003). Planning incomplete block experiments when treatments are genetically related. Biometrics, 59, 375–381.
 Bueno Filho and Gilmour (2007) Bueno Filho, J. S. D. S. and Gilmour, S. (2007). Block designs for random treatment effects. Journal of Statistical Planning and Inference, 137, 1446–1451.
 Butler et al. (2014) Butler, D. G., Smith, A. B., and Cullis, B. R. (2014). On the design of field experiments with correlated treatment effects. Journal of Agricultural, Biological and Environmental Statistics, 19, 539–555.
 Cullis et al. (2006) Cullis, B. R., Smith, A., and Coombes, N. (2006). On the design of early generation variety trials with correlated data. Journal of Agricultural, Biological and Environmental Statistics, 11, 381–393.
 Entholzner et al. (2005) Entholzner, M., Benda, N., Schmelter, T., and Schwabe, R. (2005). A note on designs for estimating population parameters. Biometrical Letters  Listy Biometryczne, 42, 25–41.
 Fedorov and Jones (2005) Fedorov, V. and Jones, B. (2005). The design of multicentre trials. Statistical Methods in Medical Research, 14, 205–248.
 Gladitz and Pilz (1982) Gladitz, J. and Pilz, J. (1982). Construction of optimal designs in random coefficient regression models. Mathematische Operationsforschung und Statistik, Series Statistics, 13, 371–385.
 GonzálezBarrios et al. (2019) GonzálezBarrios, P., DíazGarcía, L., and Gutiérrez, L. (2019). Megaenvironmental design: Using genotype environment interaction to optimize resources for cultivar testing. Crop Science, 59, 1899–1915.

Harman and Prus (2018)
Harman, R. and Prus, M. (2018).
Computing optimal experimental designs with respect to a compound
Bayes Risk criterion.
Statistics and Probability Letters
, 137, 135–141. 
Henderson (1975)
Henderson, C. R. (1975).
Best linear unbiased estimation and prediction under a selection model.
Biometrics, 31, 423–477.  Isik et al. (2017) Isik, F., Holland, J., and Maltecca, C. (2017). Genetic data analysis for plant and animal breeding. Springer, New York.
 Jiang and Lahiri (2006) Jiang, J. and Lahiri, P. (2006). Estimation of finite population domain means: A modelassisted empirical best prediction approach. Journal of the American Statistical Association, 101, 301–311.
 John and Williams (1995) John, J. and Williams, E. (1995). Cyclic and computer generated designs. Chapman and Hall, London.
 Kiefer (1974) Kiefer, J. (1974). General equivalence theory for optimum designs (approximate theory). Annals of Statistics, 2, 849–879.
 Kleinknecht et al. (2013) Kleinknecht, K., Möhring, J., Singh, K., Zaidi, P., Atlin, G., and Piepho, H.P. (2013). Comparison of the performance of blue and blup for zoned indian maize data. Crop Science, 53, 1384–1391.
 Mead et al. (2012) Mead, R., Gilmour, S., and Mead, A. (2012). Statistical principles for the design of experiments. Cambridge Univ. Press, Cambridge.
 Piepho (1997) Piepho, H.P. (1997). Analyzing genotypeenvironment data by mixed models with multiplicative effects. Biometrics, 53, 761–766.
 Piepho and Möhring (2005) Piepho, H.P. and Möhring, J. (2005). Best linear unbiased prediction of cultivar effects for subdivided target regions. Crop Science, 45, 1151–1159.
 Prus (2019) Prus, M. (2019). Optimal designs in multiple group random coefficient regression models. TEST. https://doi.org/10.1007/s11749019006546.
 Prus and Schwabe (2016) Prus, M. and Schwabe, R. (2016). Optimal designs for the prediction of individual parameters in hierarchical models. Journal of the Royal Statistical Society: Series B, 78, 175–191.
 Pukelsheim and Rieder (1992) Pukelsheim, F. and Rieder, S. (1992). Efficient rounding of approximate designs. Biometrika, 79, 763–770.
 Snedecor and Cochran (1967) Snedecor, G. W. and Cochran, W.G. Press, A. (1967). Statistical Methods. Iowa State University Press, Ames.
 Torabi and Jiang (2020) Torabi, M. and Jiang, J. (2020). Estimation of mean squared prediction error of empirically spatial predictor of small area means under a linear mixed model. Journal of Statistical Planning and Inference, 208, 82–93.
 Williams et al. (2014) Williams, E., John, J. A., and Whitaker, D. (2014). Construction of more flexible and efficient prep designs. Australian and New Zealand Journal of Statistics, 56, 89–96.
Comments
There are no comments yet.