Modifying the Chi-square and the CMH test for population genetic inference: adapting to over-dispersion
Evolve and resequence studies provide a popular approach to simulate evolution in the lab and explore its genetic basis. In this context, the chi-square test, Fishers exact test, as well as the Cochran-Mantel-Haenszel test are commonly used to infer genomic positions affected by selection from temporal changes in allele frequency. However, the null model associated with these tests does not match the null hypothesis of actual interest. Indeed due to genetic drift and possibly other additional noise components such as pool sequencing, the null variance in the data can be substantially larger than accounted forby these common test statistics. This leads to p-values that are systematically too small and therefore a huge number of false positive results. Even, if the ranking rather than the actual p-values is of interest, a naive application of the mentioned tests will give misleading results, as the amount of over-dispersion varies from locus to locus. We therefore propose adjusted statistics that take the over-dispersion into account while keeping the formulas simple. This is particularly useful in genome-wide applications, where millions of SNPs can be handled with little computational effort. We then apply the adapted test statistics to real data fromDrosophila, and investigate how in-formation from intermediate generations can be included when avail-able. The obtained formulas may also be useful in other situations, provided that the null variance either is known or can be estimated.
READ FULL TEXT