A direct approach to false discovery rates by decoy permutations

04/23/2018 ∙ by Kun He, et al. ∙ 0

The current approaches to false discovery rates (FDRs) in multiple hypothesis testing are usually based on the null distribution of a test statistic. However, all types of null distributions, including the theoretical, permutation-based and empirical ones, have some inherent drawbacks. For example, the theoretical null might fail because of improper assumptions on the sample distribution. In addition, many existing approaches to FDRs need to estimate the ratio of true null hypotheses, which is difficult and unsatisfactorily addressed. In this paper, we propose the target-decoy procedure, a different approach to FDRs for search of random variables different between cases and controls in general case-control study. Our approach is free of estimation of the null distribution and the ratio of true null hypotheses. The target-decoy procedure simply builds on the ordering of hypotheses by some statistic, and directly estimates the false target discoveries using the competing decoy hypotheses constructed by permutation. We prove that this approach can control the FDR. Simulation demonstrates that it is more stable and powerful than two most popular approaches. We also evaluate our approach on a real microarray data for identifying differential gene expression.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Controlling the false discovery rate (FDR, i.e., the expected proportion of incorrect rejections among all rejections) is a powerful and popular way to do multiple testing. FDR was firstly introduced by Benjamini and Hochberg (1995) (BH). Let be the tested null hypotheses, of which are true and the remaining are false. Let denote the number of true null hypotheses that are erroneously rejected and be the total number of hypotheses that are rejected. Then the false discovery rate is where the proportion of false discoveries, , is if and if . BH also provided a p-value based sequential method to control the FDR. FDR control is to search for a significance threshold of p-values or other test statistics such that the real FDR of all rejections does not exceed a given level. Since then, many FDR control methods have been developed, e.g., Benjamini and Yekutieli (2001); Sarkar (2002); Storey (2002, 2003); Benjamini et al. (2006); Basu et al. (2017).

The Bayes method (Storey, 2002, 2003)

is one of the most popular approaches to FDRs. This method does not only provide a Bayesian interpretation for FDR, but also improves the power of FDR control by estimating the ratio of true null hypotheses. The ratio estimation, which was absent in the original BH procedure, is now widely used in current FDR methods to enhance the power, such as the Bayes and the empirical Bayes methods

(Storey, 2002; Benjamini et al., 2006; Efron, 2008; Strimmer, 2008).

Current canonical approaches to FDRs are based on the null distribution of a test sta-
tistic (Benjamini and Hochberg, 1995; Storey, 2002; Benjamini et al., 2006). However, it is usually difficult to obtain the proper null distribution. Popular null distributions include the theoretical null, permutation null, bootstrap null and empirical null (Efron, 2008, 2012). Though the theoretical null is most popularly used, it might fail in practice for many reasons, such as improper mathematical assumptions or unobserved covariates (Efron, 2007, 2008). For example, if the real null distribution is not normal, the p-values calculated by Student’s t-test are not uniform (0, 1) distributed for true null hypotheses. The permutation null is also widely used. There are mainly two different permutation methods, i.e., the permutation tests and the pooled permutation (Kerr, 2009). The permutation tests are a class of widely applicable non-parametric tests to calculate p-values, and are most useful when the information about the data distribution is insufficient. However, the p-values reported by permutation tests are discretely distributed and their precision is limited by the sample size of a test. Low precision may reduce the statistical power (Tusher et al., 2001). Instead of estimating a null distribution for each test separately, the pooled permutation in multiple testing estimates an overall null distribution for all tests (Efron et al., 2001). However, it has been found that the pooling permutation null distributions across hypotheses can produce invalid p-values, since even true null hypotheses can have different permutation distributions (Kerr, 2009). Bootstrap is another popular method for calculating -value, but it is not applicable to the cases where the number of tests is much larger than the sample size of a test (Fan et al., 2007; Efron, 2012; Liu and Shao, 2014).

To overcome the shortcomings of the theoretical and permutation null distributions, new methods were proposed to estimate an empirical null distribution from a large number of tests (Efron et al., 2001; Efron and Tibshirani, 2002; Efron, 2008; Scott and Berger, 2010). For example, the empirical Bayes method estimates the empirical null distribution by decomposing the mixture of null and alternative distributions. However, decomposing the mixture distribution is intrinsically a difficult problem. For example, if the empirical distribution has a strong peak, the decomposing may fail (Strimmer, 2008).

As far as we know, almost all methods with the proof of FDR control depend exclusively on the accurate null distribution, no matter the required inputs are proper p-values (Storey, 2002; Benjamini et al., 2006; Gavrilov et al., 2009; Sarkar et al., 2013) or z-values (Sun and Cai, 2007; Tansey et al., 2017). Some methods do not need the accurate null distribution, but additional assumptions about the null distribution are still necessary, such as the null distribution is normal (Efron, 2007; Strimmer, 2008) and the null distributions of all statistics are the same and symmetric about 0 (Arias-Castro and Chen, 2017). Meanwhile, to enhance the power of FDR control, most popular methods estimate the ratio of true null hypotheses (Storey, 2002; Benjamini et al., 2006; Efron, 2008; Strimmer, 2008). However, these estimates often have considerable biases (Hansen and Kerr, 2012).

In this paper, we propose a direct approach to FDR control, named target-decoy procedure, which is free of the null distribution and the ratio of true null hypotheses. In this approach, a target score and a number of decoy scores are calculated for each hypothesis. The target score is calculated with regard to the original samples. The decoy scores are calculated with regard to randomly permutated samples. Based on the target score and decoy scores, a label and a final score are calculated as follows. If the target score is more significant than half of decoy scores, the hypothesis is labeled as target and the final score is set as the target score. Otherwise, the hypothesis is labeled as decoy and the final score is set as the decoy score with a specific rank mapped from the rank of the target score. Then the hypotheses are sorted by their final scores and the ratio of decoy to target hypotheses beyond a threshold is directly used for FDR control. Our approach to FDRs is exclusively based on these scores and their labels. Unlike p-values or other test statistics with clear null distributions, the score used in our approach can be any measure of the (dis)similarity of two groups of samples. Therefore, our approach is very flexible and can be more powerful than traditional approaches that are limited by the precision of p-values or the sample size of each test. Importantly, we establish the theoretical foundation of the target-decoy procedure by proving that it can rigorously control the FDR when the scores are independent between different tests.

Monte-Carlo simulations demonstrate that our approach is more stable and powerful than two popular methods, i.e., the Bayes method (Storey, 2002, 2003; Storey et al., 2004) and the empirical Bayes method (Efron et al., 2001; Efron and Tibshirani, 2002; Efron, 2008). The performances of the three methods were also compared on a real data set. Because our procedure is more straightforward and can be used with arbitrary scores, we believe that it will have many practical applications.

Our approach was inspired by the target-decoy protein database search strategy widely used to estimate the FDR of peptide identifications by tandem mass spectrometry  (Elias and Gygi, 2007). A modified version of this strategy (addition of one to the number of decoys) was proved to control the FDR by us  (He, 2013; He et al., 2015)

. The similar approach has been used for FDR control in the setting of variable selection of classical linear regression model  

(Barber and Candès, 2015). To our knowledge, our approach presents a first attempt to establish a distribution-free framework for search of random variables different between cases and controls in general case-control study. We will discuss the difference of our approach from previous related work at the end of the paper.

The rest of the paper is organized as follows. Section 2 discusses a general model for case-control study. Our target-decoy procedure is presented in Section 3. Section 4 establishes the theoretical foundation of our procedure. Numerical results are given in Section 5. In Section 6, we show an application of the target-decoy procedure on an Arabidopsis microarray data set. Section 7 concludes the paper with a discussion of related and future works. Proofs of theoretical results are provided in the Supplementary Materials.

2 Problem Formulation

Consider a case-control study involving random variables, . For each random variable where , there are random samples , where are from the cases and are from the controls.

The goal is to search for random variables different between cases and controls. The null hypothesis for random variable

used here is the ‘symmetrically distributed’ hypothesis

: the joint distribution of

is symmetric. In other words, the joint probability density function of

(or the joint probability mass function if are discrete) satisfies for any possible and any permutation of . If are independent, this hypothesis is equivalent to that are identically distributed. Here we use the ‘symmetrically distributed’ hypothesis to deal with the case where are related but still an exchangeable sequence of random variables (Chow and Teicher, 2012).

Let be some score to measure the difference between and . Without loss of generality, we assume that large scores indicate significance and holds for any possible where is any permutation of and is any permutation of . Note that neither the null distributions of scores nor the distributions of random variables are required to be known.

3 The target-decoy procedure

Our procedure for FDR control is as follows.

Algorithm : the target-decoy procedure

  1. For each test , calculate the target score . Permute the case and control status randomly for times and calculate the scores of the permutations. Sort these scores in descending order. For equal scores, sort them randomly with equal probability.

  2. For each test , calculate a final score and assign it a label , where and stand for target and decoy, respectively. Assume that the rank of is . If , let and set as . If , let and set as the score ranking . Otherwise, , let be or randomly with equal probability and set as .

  3. Sort the tests in descending order of the final scores. For tests with equal scores, sort them randomly with equal probability. Let denote the sorted labels and denote the sorted scores.

  4. If the specified FDR control level is , let

    (1)

    and reject the hypothesis with rank if and .

Section 4 will show the target-decoy procedure can control the FDR. The random permutation used in our procedure can be generated by simple random sampling either with or without replacement, just as in the permutation tests. Similarly, with larger sampling number , the power of our procedure will be a little stronger as shown in Section 5. We can set as , where is the maximum number of permutations we would perform.

Unlike other FDR control methods, our target-decoy procedure does not depend on the null distribution. The sampling number of permutations, , used in our procedure can be much smaller than that used in permutation tests. In our simulations, was set as or , while in the real data experiments, was set as . Simulations demonstrate that our target-decoy procedure can still control the FDR even if was set as and then little information about the null distributions was revealed.

4 Theoretic Analysis

In this section, we will show that equation (1) can control the FDR. Let and denote that the null hypothesis for random variable is true and false, respectively. Note that are constants in the setting of hypothesis testing. Define where as follows. If , let be if and be if . Otherwise, , let be if and be if . Table 1 summarizes the possible situations. Let denote the sorted sequence of .

Table 1: Possible values of for the target-decoy procedure.

Let and denote and , respectively. Let and denote and , respectively. We define , , and similarly. For example, we will use to denote a sequence of constants, , which is one of the observed values of . We will also define etc. Then we have the following theorem, the proof of which is given in the Supplementary Materials.

Theorem 1.

In the target-decoy procedure, if the random variables are independent, then for any fixed and any possible and we have

(2)

From Table 1 we have if and if . Thus, equation (1) is equivalent to . Meanwhile, the total number of rejected hypotheses is , and the number of falsely rejected hypotheses is . Therefore, the false discovery proportion (FDP) is and the real FDR is . To prove that equation (1) can control the FDR, we only need to prove the following theorem, and the proof is given in the Supplementary Materials.

Theorem 2.

Let random variables , be defined as in Section 3. Then for any , we have

(3)

Remark. The most intuitive way for FDR control is to set as

(4)

However, this way cannot control the FDR. Consider a case-control study where the specified FDR control level is and all hypotheses are true null. From Theorem 1 we have . If , we have . Therefore, and at least one hypothesis will be rejected. Then the FDP when is . Thus, the FDR is no less than , which is not controlled.

5 Simulation Studies

We used Monte-Carlo simulations to study the performance of our method. We considered the case-control studies in which the random variables follow the normal distribution, the gamma distribution and the Cauchy distribution, respectively. In addition to the normal distribution, we did simulation experiments for the gamma distribution because many random variables in real world are gamma-distributed. Moreover, to test the multiple testing methods with a heavy-tailed distribution, we did simulation experiments for the Cauchy distribution.

Recall that the case-control study consists of random variables. For each random variable, there are random samples, of which are from the cases and the other are from the controls. Let be the random samples for random variable .

The observation values from the normal distribution were generated in a way similar to Benjamini, Krieger, and Yekutieli (2006). First, let be independent and identically distributed random variables following the distribution. Next, let for and . We used and , with corresponding to independence and and corresponding to typical moderate and high correlation values estimated from real microarray data, respectively (Almudevar et al., 2006). The values of are zero for , the controls. For the cases where , the values of are also zero for , the hypotheses that are true null. The values of for and are set as follows. We let and for , respectively. Similarly, we let and for , respectively. This cycle was repeated to produce for the false null hypotheses.

The observation values from the gamma distribution, which is characterized using shape and scale, were generated in the following way. First, let be independent random variables where follows the distribution and follows the distribution for any and . Next, let for and in the simulation study for independent random variables and let for dependent random variables. To obtain reasonable correlation values, was set as 4 and was set as 1 for , the controls. For the cases where , was set as 1 for , the hypotheses that are true null. The values of for and are set as follows. We let and for , respectively. Similarly, we let and for , respectively. This cycle was repeated to produce for the false null hypotheses. The observation values from the Cauchy distribution were generated in a similar way and the details are given in the Supplementary Materials.

The specified FDR control levels were 5 and 10, respectively. The total number of tests, , was set as 10000. The ratio of false null hypotheses was either or . The total sample size, , was set as 20, 60 or 100, and the number of cases and that of controls were always set to the same. We only present the results of experiments for normal and gamma distributions where was 20 in the main text. Other results are given in the Supplementary Materials.

Three different approaches to FDRs were compared in our simulations, i.e., the Bayes method (Storey, 2002, 2003; Storey et al., 2004), the empirical Bayes method (Efron et al., 2001; Efron and Tibshirani, 2002; Efron, 2008) and our target-decoy procedure. The Bayes method and the empirical Bayes method are among the most remarkable multiple testing methods. To compare the power of these methods, we rejected the hypotheses against specified FDR control levels, . The rejection threshold, , for the Bayes method was set as the largest -value such that -value() is no more than (Storey, 2002, 2003). The rejection threshold, , for the empirical Bayes method was set as the minimum -value such that Efdr() is no more than , where Efdr() is the expected fdr of hypotheses with -values no smaller than (Efron, 2007, 2004). Specifically, the R packages ”locfdr” version 1.1-8 (Efron, 2004), and ”qvalue” version 2.4.2 (Storey and Tibshirani, 2003)

were used. Each simulation experiment was repeated for 1000 times. We calculated the mean number of rejected hypotheses to value the power of each method. The FDRs of rejected hypotheses were calculated by the means of FDPs. Note that the variance of the mean of FDPs of 1000 repetitions is one thousandth of the variance of FDPs. We can also estimate the standard deviation of the mean of FDPs from the sample standard deviation of FDPs.

The -values of the Bayes method and the -values of the empirical Bayes method were calculated with the Student’s -test, Wilcoxon rank sum test, the Student’s -test with permutation, or the Student’s -test with bootstrap. For the permutation and bootstrap methods, we sampled the cases and the controls for each test, calculated the -values for sampled data by -test, and calculated the -values with the null distribution of pooled -values (Xie et al., 2005; Liu and Shao, 2014). For the bootstrap method, the resampling was within the groups. The sampling number of permutations was set as 10 (Efron, 2012) and that of bootstrap was set as 200.

For our target-decoy procedure, the cases and the controls of each test were permuted for 49 times or only once and the -values and the test statistics of the Wilcoxon rank sum test were used. We did the one permutation experiments where little information about the null distributions was revealed to demonstrate that our procedure does not rely on the null distribution. Because the permutation is performed inherently in our target-decoy procedure, the extra permutation and bootstrap are unnecessary.

We will use abbreviations to represent the experiments. For example, Bayes,permutation,
Normal,10%, represents the simulation experiment where the Bayes method combined with the pooled permutation is used, the random variables follow the normal distribution, the ratio of false null hypotheses is as high as and the correlation values are 0.8. For our target-decoy procedure, -value,49,Gamma,1% represents the simulation experiment where the -value is used as the score, 49 permutations are performed for each test, the random variables follow the gamma distribution and the ratio of false null hypotheses is as low as .

5.1 Independent random variables

Normal,1% Normal,10% Gamma,1% Gamma,10%
0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10
Bayes
-test 0.050 0.100 0.050 0.100 0.023 0.048 0.031 0.072
permutation 0.048 0.099 0.048 0.098 0.027 0.068 0.047 0.103
rank-sum 0.039 0.088 0.039 0.087 0.045 0.087 0.042 0.083
bootstrap 0.012 0.035 0.034 0.080 0.000 0.001 0.005 0.028
Empirical Bayes
-test 0.044 0.092 0.040 0.084 0.006 0.013 0.008 0.023
permutation 0.039 0.078 0.039 0.086 0.048 0.124 0.055 0.119
rank-sum 0.046 0.092 0.037 0.078 0.046 0.091 0.037 0.077
bootstrap 0.005 0.015 0.022 0.060 0.000 0.000 0.000 0.001
Target-decoy
-value,49 0.041 0.094 0.049 0.099 0.043 0.094 0.050 0.100
-value,1 0.044 0.093 0.048 0.097 0.042 0.092 0.047 0.096
rank-sum,49 0.042 0.096 0.049 0.099 0.042 0.096 0.050 0.100
rank-sum,1 0.042 0.093 0.048 0.097 0.042 0.096 0.048 0.097
Table 2: Real FDRs with independent random variables. The sample size is 20. The FDRs were calculated by the means of FDPs of 1000 repetitions. The standard deviations of the means of FDPs are less than for all the experiments. All the cases where FDRs exceed are labeled with .
Normal,1% Normal,10% Gamma,1% Gamma,10%
0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10
Bayes
-test 71 80 845 937 40 50 687 798
permutation 71 80 842 933 41 55 737 861
rank-sum 67 76 813 906 48 59 734 836
bootstrap 60 68 808 902 0 1 494 673
Empirical Bayes
-test 70 78 823 909 23 32 534 650
permutation 69 76 821 913 49 66 755 891
rank-sum 69 77 806 889 47 59 715 823
bootstrap 53 61 772 863 0 0 10 221
Target-decoy
-value,49 69 79 843 935 45 60 743 853
-value,1 69 79 841 931 45 60 736 845
rank-sum,49 67 77 834 926 42 60 755 872
rank-sum,1 66 77 831 922 42 60 751 865
Table 3: Power with independent random variables. The sample size is 20. All the cases where FDRs exceed as shown in Table 2 are labeled with .

FDR Control. Table 2 shows the real FDRs of different methods with independent random variables while the specified FDR control level was or . The results show that the -test with Bayes or empirical Bayes overestimated the FDRs for the gamma distribution. The real FDRs of pooled permutation can significantly exceed when the random variables follow the gamma distribution. The Wilcoxon rank-sum test with Bayes or empirical Bayes overestimated the FDRs. The real FDRs of bootstrap were much smaller than . The target-decoy procedure always controlled the FDR.

Statistical power. Table 3 shows the statistical power of different methods with independent random variables. Bootstrap is less powerful than all the other methods, especially when the random variables follow the gamma distribution.

When the random variables follow the normal distribution, the Bayes method is a little more powerful than the target-decoy procedure while -test is used. However, it is much less powerful than the target-decoy procedure while the Wilcoxon rank-sum test is used. The empirical Bayes method is less powerful than the Bayes method and our target-decoy procedure, especially for the Normal,10% experiments.

When the random variables follow the gamma distribution, the target-decoy procedure is much more powerful than the Bayes and empirical Bayes methods, even if only one permutation was performed. Though the pooled permutation seems to be powerful, the FDRs were not controlled.

In our experiments, the target-decoy procedure always controlled the FDR. Meanwhile, our procedure is very powerful and does not rely on the null distribution. Even if only one permutation was performed, many hypotheses were still rejected while the FDRs were controlled.

5.2 Dependent random variables

Normal, Normal, Gamma
0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10
Bayes
-test 0.052 0.102 0.050 0.100 0.050 0.101 0.050 0.100 0.023 0.048 0.031 0.072
permutation 0.050 0.100 0.048 0.098 0.049 0.099 0.047 0.098 0.026 0.067 0.047 0.103
rank-sum 0.046 0.088 0.044 0.085 0.038 0.092 0.039 0.083 0.043 0.085 0.042 0.082
bootstrap 0.014 0.039 0.034 0.081 0.015 0.041 0.035 0.083 0.000 0.000 0.005 0.027
Empirical Bayes
-test 0.047 0.097 0.044 0.090 0.048 0.100 0.047 0.097 0.006 0.013 0.008 0.023
permutation 0.042 0.083 0.043 0.093 0.041 0.084 0.045 0.099 0.048 0.123 0.055 0.121
rank-sum 0.049 0.095 0.042 0.086 0.048 0.094 0.046 0.095 0.045 0.090 0.037 0.077
bootstrap 0.006 0.020 0.026 0.067 0.008 0.024 0.029 0.074 0.000 0.000 0.000 0.001
Target-decoy
-value,49 0.047 0.097 0.050 0.100 0.047 0.095 0.049 0.100 0.043 0.094 0.048 0.099
-value,1 0.046 0.096 0.048 0.098 0.045 0.096 0.049 0.100 0.042 0.092 0.047 0.096
rank-sum,49 0.049 0.099 0.049 0.099 0.045 0.096 0.050 0.100 0.042 0.090 0.050 0.100
rank-sum,1 0.048 0.100 0.049 0.099 0.048 0.097 0.050 0.100 0.040 0.089 0.047 0.096
Table 4: Real FDRs with dependent random variables. The sample size is 20. The FDRs were calculated by the means of FDPs of 1000 repetitions. The standard deviations of the means of FDPs are less than for all the experiments. All the cases where FDRs exceed are labeled with .
Normal, Normal, Gamma
0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10 0.05 0.10
Bayes
-test 82 90 927 1016 101 108 1047 1109 40 50 687 797
permutation 82 90 922 1012 101 108 1043 1106 42 55 737 861
rank-sum 80 87 907 983 98 106 1031 1086 47 59 735 836
bootstrap 74 80 893 984 93 99 1028 1087 0 1 493 673
Empirical Bayes
-test 81 90 914 999 100 108 1043 1105 23 32 536 652
permutation 81 87 912 1003 99 106 1041 1108 49 66 757 893
rank-sum 80 88 900 984 99 107 1040 1102 47 59 716 823
bootstrap 69 75 871 958 90 96 1020 1077 0 0 10 221
Target-decoy
-value,49 81 90 926 1015 100 108 1046 1109 44 60 741 852
-value,1 81 89 923 1013 100 108 1046 1109 45 60 735 845
rank-sum,49 80 89 917 1007 99 107 1045 1108 42 59 756 870
rank-sum,1 80 89 916 1005 99 107 1044 1108 41 59 749 863
Table 5: Power with dependent random variables. The sample size is 20. All the cases where FDRs exceed as shown in Table 4 are labeled with .

Similar results are found for dependent random variables.

FDR Control. Table 4 shows the real FDRs of different methods with dependent random variables while the specified FDR control level, , was or . The results show that with the Bayes meothod, the real FDRs of -test may sightly exceed in the Normal,1% experiments. Meanwhile, the -test with Bayes or empirical Bayes overestimated the FDRs for the gamma distribution. The real FDRs of pooled permutation can significantly exceed when the random variables follow the gamma distribution. The Wilcoxon rank-sum test with Bayes or empirical Bayes overestimated the FDRs. The real FDRs of bootstrap were much smaller than . The target-decoy procedure always controlled the FDR.

Statistical power. Table 5 shows the statistical power of different methods with dependent random variables. Bootstrap is less powerful than all the other methods, especially when the random variables follow the gamma distribution.

When the random variables follow the normal distribution, the Bayes method is less powerful than the target-decoy procedure while the Wilcoxon rank-sum test is used. Though the Bayes method seems to be a little more powerful than the target-decoy procedure while the -test is used, the real FDR of this method may exceed the specified FDR control level. The empirical Bayes method is less powerful than the Bayes method and our target-decoy procedure in the Normal,10%, experiments.

When the random variables follow the gamma distribution, the target-decoy procedure is much more powerful than the Bayes and empirical Bayes methods, even if only one permutation was performed. Though the pooled permutation seems to be powerful, the FDRs were not controlled.

In our experiments, the target-decoy procedure controlled the FDRs with dependent random variables in all cases. Meanwhile, the target-decoy procedure is very powerful.

6 An Application

In this section, we apply the target-decoy procedure to an Arabidopsis microarray data set. To determine whether Arabidopsis genes respond to oncogenes encoded by the transfer-DNA (T-DNA) or to bacterial effector proteins codelivered by Agrobacteria into the plant cells, Lee et al. (2009) conducted microarray experiments at h and d after inoculating wounded young Arabidopsis plants with two different Agrobacterium strains, C58 and GV3101. Strain GV3101 is a cognate of strain C58, which only lacks T-DNA, but possesses proteinaceous virulence (Vir) factors such as VirD2, VirE2, VirE3 and VirF (Vergunst et al., 2003). Wounded, but uninfected, stalks were served as control. Here we just use the 6d postinoculation data as an example (downloaded from http://www.ncbi.nlm.nih.gov/geo/, GEO accession: GSE14106). The data consisting of 22810 genes were obtained from the C58 infected and control stalks. Both infected and control stalks are with three replicates.

Similar to the simulation experiments, the Bayes method, the empirical Bayes method and our target-decoy procedure are compared here. The -values in the Bayes method and the -values in the empirical Bayes method were calculated with the Student’s -test, Wilcoxon rank sum test, and the Student’s -test with permutation, respectively. The bootstrap method is not compared because the number of tests, 22810, is much larger than the sample size of a test, i.e., 6. For the Bayes method, two-tailed tests were used. For the empirical Bayes method, we first transformed the FDR control level to the threshold of local fdr and then identified differentially expressed genes according to the threshold. For the target-decoy procedure, the absolute -values and the test statistics of the Wilcoxon rank sum test were used.

[!htb]

Table 6: Power of different methods for Arabidopsis microarray data (Lee et al., 2009).
Bayes
-test
permutation
rank-sum test
Empirical Bayes
t-test
permutation
rank-sum test
Target-decoy
t-value
rank-sum test
  • The R package ‘locfdr’ crashed while the Wilcoxon rank-sum test is used.

Because it is unknown which genes are really differentially expressed, the real FDRs can not be computed here. The power of these methods are compared. In fairness, the sampling numbers were set as in all the experiments, including the pooled permutation and the target-decoy procedure. In other words, all the permutated null scores were generated for each gene.

As shown in Table 6, no differentially expressed genes were found by the empirical Bayes method or the Wilcoxon rank-sum test. For the Bayes method, the -test is more powerful than the pooled permutation for small () while the pooled permutation is more powerful for large (). The target-decoy procedure with -test is most powerful for . The additional genes identified by the target-decoy procedure are reliable, because similar numbers, i.e., 785 genes for FDR 0.034, 1427 genes for FDR 0.050 and 2071 genes for FDR 0.065, were reported by a more specific analysis (Tan and Xu, 2014).

7 Discussion

In this paper, we proposed the target-decoy procedure, a direct approach to FDRs, which is free of the null distribution. We theoretically prove that this approach can control the FDR for independent statistics, and experimentally demonstrate that it is more stable and powerful than two most popular methods. Our procedure can also be extended to the pair-matched case-control study by adjusting the permutation sub-procedure. Actually, we need to randomly exchange the paired observed values just as the permutation tests for pair-matched study instead of permuting them in Step 1. The other steps and analyses are the same. In the target-decoy procedure, the scores are only used to determine the labels and ranks of tests, and the statistical meaning of the scores is not required. Similar to permutation tests, the target-decoy procedure can be used for any test statistic, regardless of whether or not its null distribution is known.

The decoy method was also used in the field of variable selection with a different name “knockoff” for FDR control without null distribution  (Barber and Candès, 2015). Given response and covariates, variable selection aims at removing redundant or irrelevant covariates without incurring much loss of information in model construction for response, which is different from the multiple testing problem we discussed here. In the original paper of knockoff, Barber and Candès  (Barber and Candès, 2015) considered a Gaussian linear regression model where the number of covariates is no more than the number of observations. Since then, this method has been extended to a wide range of variable selection problems, such as the high dimensional setting where the number of covariates is more than the number of observations  (Barber and Candes, 2016) and a nonparametric setting with random covariates  (Candes et al., 2018).

The knockoff filter and our target-decoy procedure are both for FDR control without null distribution. But they aim at different multiple testing problems. The knockoff filter is for variable selection which removes the redundant covariates. Thus, permutation is not suitable for constructing knockoff variables because it cannot maintain the correlation between the original variables  (Barber and Candès, 2015). Meanwhile, the FDR control by knockoff filter is based on the normal stochastic error, which is a fundamental assumption of classic linear regression model. However, for search of random variables different between the cases and controls, the impact of correlation on FDR control is greatly reduced. Constructing the decoy scores by permutation results in good performance as shown in our simulations. Meanwhile, the target-decoy procedure makes use of the natural symmetry of case-control study, i.e., the exchangeability between observations of each random variable, and is free of additional assumptions.

The following are some detailed differences between these two methods. Firstly, only one knockoff counterpart is generated for each original variable in knockoff filter and the “antisymmetry property” of statistics is necessary for FDR control. Whereas many decoy scores are generated for a hypothesis in the target-decoy procedure and the “antisymmetry property” is not required. As shown in our simulation, the target-decoy procedure is a little more powerful if more decoy scores than one are generated for a hypothesis. Secondly, in the knockoff filter method the expectation of FDP is taken over the stochastic error, which is the only randomness, and the variables with identical original and knockoff statistics will never be selected. Whereas the expectation of FDP is taken over at least four kinds of randomness in our target-decoy procedure. The first two are the randomness of and the randomness of permutations. The last two are from two special situations, including the randomness of labeling hypotheses with target scores ranking in the middle, i.e., in Step 2, and the randomness of sorting identical scores. The hypotheses with middle target scores are still possible to be rejected. Thirdly, the construction of knockoff variables with the necessary properties for FDR control is usually difficult and the construction methods can be complex, such as in  Candes et al. (2018). Our procedure constructs the decoy scores simply by permutation.

The model-X knockoffs also tried to release the constrains of having valid -values and can control the FDR of high-dimensional variable selection for a nonparametric setting in which the conditional distribution of the response is arbitrary  (Candes et al., 2018). This approach requires the covariates be random with a distribution that is known or can be estimated. However, the distribution estimation is unpractical in some studies if the number of observations for each covariate is limited. Our approach is free of this constraint.

There were many studies on the FDR control in multiple testing under dependency (Benjamini and Yekutieli, 2001; Efron, 2007; Ghosal and Roy, 2011; Kang, 2016). In this paper, we only experimentally verify the performance of our target-decoy procedure on dependent test statistics. The theoretic analysis under dependency needs further research. Moreover, our theoretic analysis is based on the ‘symmetrically distributed’ hypothesis. This null hypothesis is stronger than the hypothesis that the two groups have the same means. The performance of our procedure for the ‘equality of means’ hypothesis needs further studies.

References

  • Almudevar et al. (2006) Almudevar, A., L. B. Klebanov, X. Qiu, P. Salzman, and A. Y. Yakovlev (2006). Utility of correlation measures in analysis of gene expression. NeuroRx 3(3), 384–395.
  • Arias-Castro and Chen (2017) Arias-Castro, E. and S. Chen (2017). Distribution-free multiple testing. Electronic Journal of Statistics 11(1), 1983–2001.
  • Barber and Candès (2015) Barber, R. F. and E. J. Candès (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics 43(5), 2055–2085.
  • Barber and Candes (2016) Barber, R. F. and E. J. Candes (2016). A knockoff filter for high-dimensional selective inference. arXiv preprint arXiv:1602.03574.
  • Basu et al. (2017) Basu, P., T. T. Cai, K. Das, and W. Sun (2017). Weighted false discovery rate control in large-scale multiple testing. Journal of the American Statistical Association 0(ja), 0–0.
  • Benjamini and Hochberg (1995) Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), 289–300.
  • Benjamini et al. (2006) Benjamini, Y., A. M. Krieger, and D. Yekutieli (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93(3), 491–507.
  • Benjamini and Yekutieli (2001) Benjamini, Y. and D. Yekutieli (2001). The control of the false discovery rate in multiple testing under dependency. Annals of statistics, 1165–1188.
  • Candes et al. (2018) Candes, E., Y. Fan, L. Janson, and J. Lv (2018). Panning for gold: model-x knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
  • Chow and Teicher (2012) Chow, Y. S. and H. Teicher (2012). Probability theory: independence, interchangeability, martingales. Springer Science & Business Media.
  • Efron (2004) Efron, B. (2004). Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. Journal of the American Statistical Association 99(465), 96–104.
  • Efron (2007) Efron, B. (2007). Size, power and false discovery rates. The Annals of Statistics, 1351–1377.
  • Efron (2008) Efron, B. (2008). Microarrays, empirical bayes and the two-groups model. Statistical science, 1–22.
  • Efron (2012) Efron, B. (2012). Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, Volume 1. Cambridge University Press.
  • Efron and Tibshirani (2002) Efron, B. and R. Tibshirani (2002). Empirical bayes methods and false discovery rates for microarrays. Genetic epidemiology 23(1), 70–86.
  • Efron et al. (2001) Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher (2001). Empirical bayes analysis of a microarray experiment. Journal of the American statistical association 96(456), 1151–1160.
  • Elias and Gygi (2007) Elias, J. E. and S. P. Gygi (2007). Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nature Methods 4(3), 207–214.
  • Fan et al. (2007) Fan, J., P. Hall, and Q. Yao (2007). To how many simultaneous hypothesis tests can normal, student’s t or bootstrap calibration be applied? Journal of the American Statistical Association 102(480), 1282–1288.
  • Gavrilov et al. (2009) Gavrilov, Y., Y. Benjamini, and S. K. Sarkar (2009). An adaptive step-down procedure with proven fdr control under independence. The Annals of Statistics, 619–629.
  • Ghosal and Roy (2011) Ghosal, S. and A. Roy (2011). Predicting false discovery proportion under dependence. Journal of the American Statistical Association 106(495), 1208–1218.
  • Hansen and Kerr (2012) Hansen, E. and K. F. Kerr (2012). A comparison of two classes of methods for estimating false discovery rates in microarray studies. Scientifica 2012.
  • He (2013) He, K. (2013). Multiple hypothesis testing methods for large-scale peptide identification in computational proteomics. Master’s thesis, University of Chinese Academy of Sciences. http://dpaper.las.ac.cn/Dpaper/detail/detailNew?paperID=20015738. [in Chinese with English abstract].
  • He et al. (2015) He, K., Y. Fu, W.-F. Zeng, L. Luo, H. Chi, C. Liu, L.-Y. Qing, R.-X. Sun, and S.-M. He (2015). A theoretical foundation of the target-decoy search strategy for false discovery rate control in proteomics. arXiv preprint arXiv:1501.00537.
  • Kang (2016) Kang, M. (2016).

    Optimal false discovery rate control with kernel density estimation in a microarray experiment.

    Communications in Statistics-Simulation and Computation 45(3), 771–780.
  • Kerr (2009) Kerr, K. F. (2009). Comments on the analysis of unbalanced microarray data. Bioinformatics 25(16), 2035–2041.
  • Lee et al. (2009) Lee, C.-W., M. Efetova, J. C. Engelmann, R. Kramell, C. Wasternack, J. Ludwig-Müller, R. Hedrich, and R. Deeken (2009). Agrobacterium tumefaciens promotes tumor induction by modulating pathogen defense in arabidopsis thaliana. The Plant Cell 21(9), 2948–2962.
  • Liu and Shao (2014) Liu, W. and Q.-M. Shao (2014). Phase transition and regularized bootstrap in large-scale -tests with false discovery rate control. The Annals of Statistics 42(5), 2003–2025.
  • Sarkar (2002) Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Annals of statistics, 239–257.
  • Sarkar et al. (2013) Sarkar, S. K., J. Chen, and W. Guo (2013). Multiple testing in a two-stage adaptive design with combination tests controlling fdr. Journal of the American Statistical Association 108(504), 1385–1401.
  • Scott and Berger (2010) Scott, J. G. and J. O. Berger (2010). Bayes and empirical-bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics 38(5), 2587–2619.
  • Storey (2002) Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(3), 479–498.
  • Storey (2003) Storey, J. D. (2003). The positive false discovery rate: a bayesian interpretation and the q-value. The Annals of Statistics 31(6), 2013–2035.
  • Storey et al. (2004) Storey, J. D., J. E. Taylor, and D. Siegmund (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66(1), 187–205.
  • Storey and Tibshirani (2003) Storey, J. D. and R. Tibshirani (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences 100(16), 9440–9445.
  • Strimmer (2008) Strimmer, K. (2008). A unified approach to false discovery rate estimation. BMC bioinformatics 9(1), 303.
  • Sun and Cai (2007) Sun, W. and T. T. Cai (2007). Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association 102(479), 901–912.
  • Tan and Xu (2014) Tan, Y.-D. and H. Xu (2014). A general method for accurate estimation of false discovery rates in identification of differentially expressed genes. Bioinformatics 30(14), 2018–2025.
  • Tansey et al. (2017) Tansey, W., O. Koyejo, R. A. Poldrack, and J. G. Scott (2017). False discovery rate smoothing. Journal of the American Statistical Association 0(ja), 0–0.
  • Tusher et al. (2001) Tusher, V. G., R. Tibshirani, and G. Chu (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences 98(9), 5116–5121.
  • Vergunst et al. (2003) Vergunst, A. C., M. C. van Lier, A. den Dulk-Ras, and P. J. Hooykaas (2003). Recognition of the agrobacterium tumefaciens vire2 translocation signal by the virb/d4 transport system does not require vire1. Plant physiology 133(3), 978–988.
  • Xie et al. (2005) Xie, Y., W. Pan, and A. B. Khodursky (2005). A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data. Bioinformatics 21(23), 4280–4288.