1 Introduction
New crop varieties are usually evaluated for their performance in a target population of environments (TPE), where environments correspond to locations in specific years. This evaluation requires conducting randomized field trials at several environments sampled from the TPE. Such trials are called multi-environment trials (MET). Analysis of MET is routinely performed using linear mixed models comprising effects for genotypes, environments and their interaction (Isik et al. (2017)).
If the TPE is large and can be suitably stratified along geographical borders or agro-ecological zonations, it may be advantageous to subdivide the TPE into sub-regions. If the same set of genotypes is tested at a number of locations in each of the sub-regions, a linear mixed model may be fitted with random genotype-within-subregion effects that allows estimating a genotype’s average performance in each sub-region using best linear unbiased prediction (BLUP) (
Atlin et al. (2000); Piepho and Möhring (2005)). If a covariance is assumed between a genotype’s performance in different sub-regions, the model allows borrowing strength across sub-regions, meaning that estimates of mean performance in a sub-region becomes more accurate than when based on data from the sub-region alone (Kleinknecht et al. (2013)).Whereas analysis of sub-divided TPE data has received some attention in the recent past, to the best of our knowledge the design of MET when a sub-division is envisaged has not been considered. The design of such trials has gained interest recently in endeavours to integrate trial networks across country borders (e.g., Horizon 2020 project INVITE = INnovations in plant VarIety Testing in Europe).
The design of MET for a sub-divided TPE involves two decisions: (1) The total number of environments at which to conduct the trials and (2) the allocation of this total number of environments to the different sub-regions. This paper is devoted to the second decision.
2 Model Specification and Prediction
In this work we use a linear mixed model (LMM) in which observation of genotype in location within the -th sub-region is given by
(1) |
for , , and , where , , denotes the number of genotypes, is the number of sub-regions, is the number of locations within the -th sub-region and is the total number of locations. Moreover, denotes the mean (fixed) effect of the -th sub-region, is the interaction effect of genotype in sub-region , is the effect of the -th location within the -th sub-region, denotes the effect of the -th genotype in the -th location within the -th sub-region, is the effect of the -th replication in location in sub-region and denotes the observational error. All random effects and observational errors are assumed to have zero mean. The variances are given by , , and and the covariance matrix of the genotype effects is , where is some positive definite matrix.
Our main focus is the prediction of the genotype effects and the pairwise linear contrasts , , . For given total number of locations we search for the numbers of locations within the sub-regions, which are optimal for the prediction. Optimal designs for the estimation of fixed effects in LMM are well discussed in the literature (see e.g. Fedorov and Jones (2005) or Entholzner et al. (2005)). Less has been done for the prediction of random effects: the most general case - hierarchical random coefficient regression models - has been considered by Prus and Schwabe (2016). However, due to its more complicated covariance structure, model (1) is not a particular case of those models. Therefore, the proposed approach cannot be used here. Also in the recently published work of Prus (2019) a simpler covariance structure has been assumed.
We measure the performance of the prediction in terms of the mean squared error (MSE) matrix. The MSE matrices of the BLUPs for the genotype effects and for the pairwise linear contrasts are provided by the next lemma.
Lemma 1.
a) The MSE matrix of the BLUP of is given by
(2) |
where , is the vector of length
b) The MSE matrix of the BLUP of is given by
(3) |
The proof of the lemma is deferred to the appendix.
3 Optimal Design
For the present optimization problem, we define (exact) designs as follows:
(4) |
where denote the sub-regions.
For analytical purposes we also introduce approximate designs (see e. g. Kiefer (1974)):
(5) |
where is the weight of locations within sub-region . For these designs the requirement of integer values of is dropped and only the conditions and have to be satisfied.
We define the information matrix as
(6) |
and note that for exact designs the following condition is satisfied:
(7) |
Then we extend the definitions of MSE matrices (2) and (3) with respect to approximate designs and obtain
(8) |
and
(9) |
3.1 A-optimal designs
The A-criterion for prediction may be defined as the trace of the MSE matrix (see e. g. Prus and Schwabe (2016)). For approximate designs this definition can be generalized using the extended MSE matrices (8) and (9). Then we evaluate (neglecting the constant factor ) for the pairwise linear contrasts the criterion
(10) |
where .
The A-criterion for the genotype effects differs from (10) only by the constant term and the multiplicator , both of which have no influence on the solution to the optimization. Therefore, optimal designs for the prediction of the genotype effects and the linear contrasts are the same. The next theorem provides the optimality condition for approximate designs.
Theorem 1.
An approximate design is A-optimal for the prediction of the genotype effects and the pairwise linear contrasts iff
(11) |
where is the vector of length with the -th entry equal to and all other entries equal to .
For all with equality holds in (11).
Proof.
The A-criterion (10) may be recognized as a particular Bayesian A-criterion. The optimality condition follows from Theorem 1 in Gladitz and Pilz (1982)
for the linear transformation matrix
, the regression functions and the design region . ∎Corollary 1.
Let be an A-optimal design for the prediction of the genotype effects and the pairwise linear contrasts . Let and be support points of ( and ). Then the following equality holds:
(12) |
Note that designs for which some weights (and consequently some numbers of locations per sub-region) have zero value are also acceptable in our research.
Example 1: Compound symmetry model. We consider a (compound symmetry) model with the particular covariance structure of genotype effects with positive and real valued (for which is positive definite). For this model some optimal designs can be obtained explicitly.
Theorem 2.
In the compound symmetry model the (balanced) design with , , is A-optimal for the prediction of the genotype effects and the pairwise linear contrasts.
Proof.
For the balanced design the information matrix is given by . Then it can be easily verified that all diagonal entries of the matrix are the same, which leads to equalities in (11) for all . ∎
3.2 Optimal designs with respect to weighted A-criterion
In this section we focus on the prediction of the pairwise contrasts . We define the weighted A-criterion as the weighted sum across all sub-regions of the variances of the differences between the predicted and the real contrasts:
(13) |
where denote coefficients, which are related to the sub-regions. One possible choice is the size of the sub-regions. Alternatively, equal weight may be given to each sub-region, meaning that for all . (In this case the weighted A-criterion coincides with the standard one). Then we extend this definition with respect to approximate designs and obtain (neglecting the constant multiplicator ) the following criterion:
(14) |
where . The next theorem presents the optimality condition for approximate designs.
Theorem 3.
A design is optimal for the prediction of pairwise linear contrasts with respect to the weighted A-criterion iff
(15) |
For all with equality holds in (3).
Proof.
Corollary 2.
Let be an optimal design with respect to the weighted A-criterion for the prediction of the genotype effects and the pairwise linear contrasts . Let and be support points of ( and ). Then the following equality holds:
(16) |
For the weighted A-criterion the optimal designs are not as easy to guess as for the standard A-criterion. Specifically, it is worth mentioning that the design with , , which intuitively could be a solution of the optimization problem, is in general not optimal (see the real data example below).
3.3 Enforcing the same efficiency in each sub-region
For some studies it is required that the variances of the differences between the real and the predicted effects are the same for all sub-regions. For the model under investigation (model (1)) this condition is given by
(17) |
Under this condition the numbers of locations can be obtained as a solution of a system of equations (for example fix and ) with unknown variables () and no further optimization is needed.
Remark 1.
For the compound symmetry model the balanced design is a solution of (17).
Remark 2.
Not all A-optimal designs for the prediction of the genotype effects and the pairwise linear contrasts satisfy condition (17).
Theorem 4.
Let the covariance matrix of random effects be diagonal. Let be a design satisfying condition (17). Then is A-optimal for the prediction of the genotype effects and the pairwise linear contrasts.
Proof.
Corollary 3.
Let the covariance matrix of random effects be diagonal. Let be an A-optimal design for the prediction of the genotype effects and the pairwise linear contrasts with for all . Then satisfies condition (17).
4 Real Data Example
We here consider variance components from a study on maize variety trials in India with five agroecological sub-regions. The dataset comprises four maturity groups of maize. Here, we consider only the extra-early maturity group (Kleinknecht et al. (2013), Tables 6 and 7). Based on the variance components reported in the paper, we derived variance components to be used in our design problems.
In Table 1 we summarize the variance components in the model under investigation (Model (1)) and how we determined these from the parameter estimates for the model considered by Kleinknecht et al. (2013). According to this table the adjusted covariance matrix form formula (10) may be computed as
(18) |
Effect | Model (1) | Model in Kleinknecht et al. (2013) | Variance in Model (1) | Variance in Kleinknecht et al. (2013) |
Zone + mean | fixed | 426+107+153 | ||
Genotypezone | for | |||
Locationzone | 1129+1000 | |||
Genloczone
+ Obs errors |
160+333 |
We consider both the standard and the weighted A-criterion for first-order factor-analytic (FA) and compound symmetry (CS) variance-covariance structures, which were discussed in Kleinknecht et al. (2013) (see also Piepho (1997) for FA models).
4.1 Standard A-criterion
For the first-order factor-analytic model we take the covariance matrix from Table 6 in Kleinknecht et al. (2013) (right part):
(19) |
Table 2 summarizes results for optimal designs in the first-order factor-analytic model. As we can see in Table 2, optimal designs depend on both the total number of locations and the error variance .
Approximate design | Exact design | ||||||||||
20 | 50 | 0.33 | 0.13 | 0.18 | 0.31 | 0.04 | 7 | 3 | 3 | 6 | 1 |
200 | 0.31 | 0.15 | 0.19 | 0.29 | 0.06 | 6 | 3 | 4 | 6 | 1 | |
400 | 0.29 | 0.16 | 0.20 | 0.27 | 0.09 | 6 | 3 | 4 | 5 | 2 | |
40 | 50 | 0.27 | 0.17 | 0.20 | 0.25 | 0.10 | 11 | 7 | 8 | 10 | 4 |
200 | 0.26 | 0.18 | 0.20 | 0.24 | 0.12 | 10 | 7 | 8 | 10 | 5 | |
400 | 0.25 | 0.19 | 0.21 | 0.23 | 0.13 | 10 | 8 | 8 | 9 | 5 | |
100 | 50 | 0.23 | 0.19 | 0.21 | 0.22 | 0.15 | 23 | 19 | 21 | 22 | 15 |
200 | 0.23 | 0.19 | 0.21 | 0.21 | 0.16 | 23 | 19 | 21 | 21 | 16 | |
400 | 0.22 | 0.20 | 0.21 | 0.21 | 0.17 | 22 | 20 | 20 | 21 | 17 |
4.2 Weighted A-criterion
For the weighted A-criterion we used the coefficients , , , , , which correspond to the areas of the sub-regions, respectively, as determined from a digitized version of the map shown in Kleinknecht et al. (2013).
Table 3 summarizes the results for optimal designs in the factor-analytic model.
Approximate design | Exact design | ||||||||||
20 | 50 | 0.35 | 0.03 | 0.10 | 0.37 | 0.15 | 7 | 1 | 2 | 7 | 3 |
200 | 0.33 | 0.05 | 0.11 | 0.35 | 0.16 | 7 | 1 | 2 | 7 | 3 | |
400 | 0.30 | 0.08 | 0.13 | 0.32 | 0.18 | 6 | 2 | 2 | 6 | 4 | |
40 | 50 | 0.28 | 0.09 | 0.13 | 0.30 | 0.19 | 11 | 4 | 5 | 12 | 8 |
200 | 0.27 | 0.10 | 0.14 | 0.29 | 0.20 | 11 | 4 | 6 | 11 | 8 | |
400 | 0.27 | 0.10 | 0.14 | 0.29 | 0.20 | 10 | 5 | 6 | 11 | 8 | |
100 | 50 | 0.24 | 0.13 | 0.15 | 0.26 | 0.22 | 24 | 13 | 15 | 26 | 22 |
200 | 0.24 | 0.13 | 0.15 | 0.25 | 0.22 | 24 | 13 | 15 | 25 | 23 | |
400 | 0.23 | 0.14 | 0.16 | 0.25 | 0.23 | 23 | 14 | 15 | 25 | 23 |
The optimal designs in the compound symmetry model are presented in Table 4.
Approximate design | Exact design | ||||||||||
20 | 50 | 0.22 | 0.10 | 0.12 | 0.26 | 0.30 | 4 | 2 | 3 | 5 | 6 |
200 | 0.21 | 0.11 | 0.12 | 0.26 | 0.30 | 4 | 2 | 3 | 5 | 6 | |
400 | 0.21 | 0.11 | 0.13 | 0.26 | 0.29 | 4 | 2 | 3 | 5 | 6 | |
40 | 50 | 0.21 | 0.12 | 0.13 | 0.25 | 0.29 | 9 | 5 | 5 | 10 | 11 |
200 | 0.21 | 0.12 | 0.13 | 0.25 | 0.28 | 9 | 5 | 5 | 10 | 11 | |
400 | 0.21 | 0.13 | 0.14 | 0.25 | 0.28 | 9 | 5 | 5 | 10 | 11 | |
100 | 50 | 0.21 | 0.13 | 0.14 | 0.24 | 0.27 | 21 | 13 | 15 | 24 | 27 |
200 | 0.21 | 0.14 | 0.15 | 0.24 | 0.27 | 21 | 13 | 15 | 24 | 27 | |
400 | 0.21 | 0.14 | 0.15 | 0.24 | 0.26 | 21 | 14 | 15 | 24 | 26 |
Note that optimal designs in Tables 3 and 4 depend on the total number of locations and the error variance and are in general not equal to the ratios , . The results for the factor-analytic and compound symmetry models are different, illustrating that the optimal designs depend on the variance-covariance structure of genotypic effects within sub-regions. In case of compound symmetry optimal designs are less sensitive to and than for the factor-analytic model.
All computations were performed using the procedures od.SOCP and od.MISOCP from the package OptimalDesign in R for optimal approximate and exact designs, respectively, as proposed in Harman and Prus (2018). Note that the exact designs obtained using od.MISOCP are optimal in the class of exact designs for the model under investigation for the given data.
4.3 Enforcing the same efficiency in each sub-region
When using the CS structure (20) in , we obtained the trivial solution , . When using the factor-analytic structure in (19), the solutions were as shown in Table 5. The exact designs were obtained by efficient rounding (see Pukelsheim and Rieder (1992)).
Approximate design | Exact design | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
20 | 50 | 0.342 | 0.148 | 0.205 | 0.302 | 0.003 | 6 | 3 | 4 | 6 | 1 |
200 | 0.320 | 0.158 | 0.209 | 0.284 | 0.029 | 6 | 3 | 4 | 6 | 1 | |
400 | 0.291 | 0.172 | 0.211 | 0.260 | 0.065 | 6 | 3 | 4 | 5 | 2 | |
40 | 50 | 0.274 | 0.179 | 0.212 | 0.247 | 0.088 | 11 | 7 | 8 | 10 | 4 |
200 | 0.262 | 0.183 | 0.212 | 0.239 | 0.104 | 10 | 7 | 9 | 10 | 4 | |
400 | 0.247 | 0.189 | 0.211 | 0.228 | 0.125 | 10 | 8 | 8 | 9 | 5 | |
100 | 50 | 0.231 | 0.194 | 0.209 | 0.217 | 0.150 | 23 | 19 | 21 | 22 | 15 |
200 | 0.226 | 0.195 | 0.208 | 0.214 | 0.157 | 22 | 20 | 21 | 21 | 16 | |
400 | 0.219 | 0.197 | 0.206 | 0.210 | 0.167 | 22 | 20 | 20 | 21 | 17 |
We used the function nlphqn in SAS/IML to solve the nonlinear system of equations in (17).
5 Discussion
In crop research, the design of experiments is mainly considered in the context of a single environment and assuming that treatment effects are fixed (John and Williams (1995); Mead et al. (2012)). Recently, there has been an increased interest in design for experiments when treatments are modeled as correlated random effects using kinship or pedigree information (Bueno Filho and Gilmour (2003), Bueno Filho and Gilmour (2007); Cullis et al. (2006); Butler et al. (2014)). Also, the design of multi-environment trials has been considered in a few papers, most notably in the context of partially replicated (p-rep) trials (Williams et al. (2014)), but also in broader contexts (González-Barrios et al. (2019)). To the best of our knowledge, however, the problem of allocation of location numbers in subdivided TPE has never been considered in any detail. The problem is reminiscent of optimal allocation in stage-wise sampling based on a nested random-effects model (Snedecor and Cochran (1967), p. 529) but the approach needed is more complex due to the linear mixed model involving several fixed and random effects and the optimization being targeted to the prediction of random effects. There is also some relation to small-area estimation in surveys where mixed models are used for estimation (Jiang and Lahiri (2006), Torabi and Jiang (2020)), but design in that context is not usually targeted at individual domains or small areas, and there is no notion of a larger number of treatments as in MET.
Our main focus was the optimal allocation of locations for different sub-regions with respect to the estimation of genotype effects and pairwise linear contrasts for A- and particular linear (weighted A-) criteria. The proposed approach is based on the method of best linear unbiased prediction (BLUP). For our problem Bayesian optimal designs for a transformed covariance matrix of genotype effects turn out to be optimal. In the example we considered two kinds of models with respect to the covariance structure: first-order factor-analytic and compound symmetry. The resulting designs in both cases depend on the covariance structure, observational errors variance and the total number of locations in all sub-regions. The only exception is the standard A-criterion for compound symmetry: in this case balanced designs are optimal.
Our criterion integrates the efficiencies for BLUPs of interest across sub-regions. There are three variations to this approach. Two of them take a weighted or unweighted average across sub-regions, and optimization typically leads to allocations that imply unequal efficiency between sub-regions. The third approach imposes the additional restriction that efficiency be the same for each sub-region. We think this latter approach is particularly relevant when several administrative entities (federal states or countries) join forces to link up their trialling networks for cross-boundary analysis. For such efforts to be successful it is vital that the benefit, in terms of efficiency gain compared to independent analysis, can be split equally between the administrative entities involved.
Appendix A Proof of Lemma 1
To make use of the theoretical results that are available in the literature (see e. g. Henderson (1975)) for the prediction of random parameters we will represent the model (1) as a particular case of the general LMM
(21) |
with design matrices and for the fixed effects and the random effects, respectively. In (21), denotes the fixed effects and are the random effects. The random effects and the observational errors are assumed to have zero mean and to be all uncorrelated with positive definite covariance matrices and , respectively. Random effects and observational errors are assumed to be uncorrelated.
The latter equation may be alternatively written as
(22) |
where . Model (22) is of form (21) with , , and
According to Henderson (1975) the MSE matrix of the BLUP of the random effects (which corresponds to in our model (22)) is given by
(23) |
where denotes a generalized inverse of . With this formula we obtain MSE matrix (2). Then using the relation between the genotype effects and the pairwise contrasts we receive formula (3).
Acknowledgment
This research was partially supported by grant SCHW 531/16 of the German Research Foundation (DFG). The authors are grateful to Waqas Malik (University of Hohenheim) for determining the areas of the five breeding zones for maize in India based on a digitized map.
References
- Atlin et al. (2000) Atlin, G., Baker, R. J., McRae, K. B., and Lu., X. (2000). Selection response in subdivided target regions. Crop Science, 40, 7–13.
- Bueno Filho and Gilmour (2003) Bueno Filho, J. S. D. S. and Gilmour, S. (2003). Planning incomplete block experiments when treatments are genetically related. Biometrics, 59, 375–381.
- Bueno Filho and Gilmour (2007) Bueno Filho, J. S. D. S. and Gilmour, S. (2007). Block designs for random treatment effects. Journal of Statistical Planning and Inference, 137, 1446–1451.
- Butler et al. (2014) Butler, D. G., Smith, A. B., and Cullis, B. R. (2014). On the design of field experiments with correlated treatment effects. Journal of Agricultural, Biological and Environmental Statistics, 19, 539–555.
- Cullis et al. (2006) Cullis, B. R., Smith, A., and Coombes, N. (2006). On the design of early generation variety trials with correlated data. Journal of Agricultural, Biological and Environmental Statistics, 11, 381–393.
- Entholzner et al. (2005) Entholzner, M., Benda, N., Schmelter, T., and Schwabe, R. (2005). A note on designs for estimating population parameters. Biometrical Letters - Listy Biometryczne, 42, 25–41.
- Fedorov and Jones (2005) Fedorov, V. and Jones, B. (2005). The design of multicentre trials. Statistical Methods in Medical Research, 14, 205–248.
- Gladitz and Pilz (1982) Gladitz, J. and Pilz, J. (1982). Construction of optimal designs in random coefficient regression models. Mathematische Operationsforschung und Statistik, Series Statistics, 13, 371–385.
- González-Barrios et al. (2019) González-Barrios, P., Díaz-García, L., and Gutiérrez, L. (2019). Mega-environmental design: Using genotype environment interaction to optimize resources for cultivar testing. Crop Science, 59, 1899–1915.
-
Harman and Prus (2018)
Harman, R. and Prus, M. (2018).
Computing optimal experimental designs with respect to a compound
Bayes Risk criterion.
Statistics and Probability Letters
, 137, 135–141. -
Henderson (1975)
Henderson, C. R. (1975).
Best linear unbiased estimation and prediction under a selection model.
Biometrics, 31, 423–477. - Isik et al. (2017) Isik, F., Holland, J., and Maltecca, C. (2017). Genetic data analysis for plant and animal breeding. Springer, New York.
- Jiang and Lahiri (2006) Jiang, J. and Lahiri, P. (2006). Estimation of finite population domain means: A model-assisted empirical best prediction approach. Journal of the American Statistical Association, 101, 301–311.
- John and Williams (1995) John, J. and Williams, E. (1995). Cyclic and computer generated designs. Chapman and Hall, London.
- Kiefer (1974) Kiefer, J. (1974). General equivalence theory for optimum designs (approximate theory). Annals of Statistics, 2, 849–879.
- Kleinknecht et al. (2013) Kleinknecht, K., Möhring, J., Singh, K., Zaidi, P., Atlin, G., and Piepho, H.-P. (2013). Comparison of the performance of blue and blup for zoned indian maize data. Crop Science, 53, 1384–1391.
- Mead et al. (2012) Mead, R., Gilmour, S., and Mead, A. (2012). Statistical principles for the design of experiments. Cambridge Univ. Press, Cambridge.
- Piepho (1997) Piepho, H.-P. (1997). Analyzing genotype-environment data by mixed models with multiplicative effects. Biometrics, 53, 761–766.
- Piepho and Möhring (2005) Piepho, H.-P. and Möhring, J. (2005). Best linear unbiased prediction of cultivar effects for subdivided target regions. Crop Science, 45, 1151–1159.
- Prus (2019) Prus, M. (2019). Optimal designs in multiple group random coefficient regression models. TEST. https://doi.org/10.1007/s11749-019-00654-6.
- Prus and Schwabe (2016) Prus, M. and Schwabe, R. (2016). Optimal designs for the prediction of individual parameters in hierarchical models. Journal of the Royal Statistical Society: Series B, 78, 175–191.
- Pukelsheim and Rieder (1992) Pukelsheim, F. and Rieder, S. (1992). Efficient rounding of approximate designs. Biometrika, 79, 763–770.
- Snedecor and Cochran (1967) Snedecor, G. W. and Cochran, W.G. Press, A. (1967). Statistical Methods. Iowa State University Press, Ames.
- Torabi and Jiang (2020) Torabi, M. and Jiang, J. (2020). Estimation of mean squared prediction error of empirically spatial predictor of small area means under a linear mixed model. Journal of Statistical Planning and Inference, 208, 82–93.
- Williams et al. (2014) Williams, E., John, J. A., and Whitaker, D. (2014). Construction of more flexible and efficient p-rep designs. Australian and New Zealand Journal of Statistics, 56, 89–96.
Comments
There are no comments yet.