1 Introduction
Controlling the false discovery rate (FDR, i.e., the expected proportion of incorrect rejections among all rejections) is a powerful and popular way to do multiple testing. FDR was firstly introduced by Benjamini and Hochberg (1995) (BH). Let be the tested null hypotheses, of which are true and the remaining are false. Let denote the number of true null hypotheses that are erroneously rejected and be the total number of hypotheses that are rejected. Then the false discovery rate is where the proportion of false discoveries, , is if and if . BH also provided a pvalue based sequential method to control the FDR. FDR control is to search for a significance threshold of pvalues or other test statistics such that the real FDR of all rejections does not exceed a given level. Since then, many FDR control methods have been developed, e.g., Benjamini and Yekutieli (2001); Sarkar (2002); Storey (2002, 2003); Benjamini et al. (2006); Basu et al. (2017).
The Bayes method (Storey, 2002, 2003)
is one of the most popular approaches to FDRs. This method does not only provide a Bayesian interpretation for FDR, but also improves the power of FDR control by estimating the ratio of true null hypotheses. The ratio estimation, which was absent in the original BH procedure, is now widely used in current FDR methods to enhance the power, such as the Bayes and the empirical Bayes methods
(Storey, 2002; Benjamini et al., 2006; Efron, 2008; Strimmer, 2008).Current canonical approaches to FDRs are based on the null distribution of a test sta
tistic (Benjamini and
Hochberg, 1995; Storey, 2002; Benjamini
et al., 2006). However, it is usually difficult to obtain the proper null distribution. Popular null distributions include the theoretical null, permutation null, bootstrap null and empirical null (Efron, 2008, 2012). Though the theoretical null is most popularly used, it might fail in practice for many reasons, such as improper mathematical assumptions or unobserved covariates (Efron, 2007, 2008). For example, if the real null distribution is not normal, the pvalues calculated by Student’s ttest are not uniform (0, 1) distributed for true null hypotheses. The permutation null is also widely used. There are mainly two different permutation methods, i.e., the permutation tests and the pooled permutation (Kerr, 2009). The permutation tests are a class of widely applicable nonparametric tests to calculate pvalues, and are most useful when the information about the data distribution is insufficient. However, the pvalues reported by permutation tests are discretely distributed and their precision is limited by the sample size of a test. Low precision may reduce the statistical power (Tusher
et al., 2001). Instead of estimating a null distribution for each test separately, the pooled permutation in multiple testing estimates an overall null distribution for all tests (Efron
et al., 2001). However, it has been found that the pooling permutation null distributions across hypotheses can produce invalid pvalues, since even true null hypotheses can have different permutation distributions (Kerr, 2009). Bootstrap is another popular method for calculating value, but it is not applicable to the cases where the number of tests is much larger than the sample size of a test (Fan
et al., 2007; Efron, 2012; Liu and
Shao, 2014).
To overcome the shortcomings of the theoretical and permutation null distributions, new methods were proposed to estimate an empirical null distribution from a large number of tests (Efron et al., 2001; Efron and Tibshirani, 2002; Efron, 2008; Scott and Berger, 2010). For example, the empirical Bayes method estimates the empirical null distribution by decomposing the mixture of null and alternative distributions. However, decomposing the mixture distribution is intrinsically a difficult problem. For example, if the empirical distribution has a strong peak, the decomposing may fail (Strimmer, 2008).
As far as we know, almost all methods with the proof of FDR control depend exclusively on the accurate null distribution, no matter the required inputs are proper pvalues (Storey, 2002; Benjamini et al., 2006; Gavrilov et al., 2009; Sarkar et al., 2013) or zvalues (Sun and Cai, 2007; Tansey et al., 2017). Some methods do not need the accurate null distribution, but additional assumptions about the null distribution are still necessary, such as the null distribution is normal (Efron, 2007; Strimmer, 2008) and the null distributions of all statistics are the same and symmetric about 0 (AriasCastro and Chen, 2017). Meanwhile, to enhance the power of FDR control, most popular methods estimate the ratio of true null hypotheses (Storey, 2002; Benjamini et al., 2006; Efron, 2008; Strimmer, 2008). However, these estimates often have considerable biases (Hansen and Kerr, 2012).
In this paper, we propose a direct approach to FDR control, named targetdecoy procedure, which is free of the null distribution and the ratio of true null hypotheses. In this approach, a target score and a number of decoy scores are calculated for each hypothesis. The target score is calculated with regard to the original samples. The decoy scores are calculated with regard to randomly permutated samples. Based on the target score and decoy scores, a label and a final score are calculated as follows. If the target score is more significant than half of decoy scores, the hypothesis is labeled as target and the final score is set as the target score. Otherwise, the hypothesis is labeled as decoy and the final score is set as the decoy score with a specific rank mapped from the rank of the target score. Then the hypotheses are sorted by their final scores and the ratio of decoy to target hypotheses beyond a threshold is directly used for FDR control. Our approach to FDRs is exclusively based on these scores and their labels. Unlike pvalues or other test statistics with clear null distributions, the score used in our approach can be any measure of the (dis)similarity of two groups of samples. Therefore, our approach is very flexible and can be more powerful than traditional approaches that are limited by the precision of pvalues or the sample size of each test. Importantly, we establish the theoretical foundation of the targetdecoy procedure by proving that it can rigorously control the FDR when the scores are independent between different tests.
MonteCarlo simulations demonstrate that our approach is more stable and powerful than two popular methods, i.e., the Bayes method (Storey, 2002, 2003; Storey et al., 2004) and the empirical Bayes method (Efron et al., 2001; Efron and Tibshirani, 2002; Efron, 2008). The performances of the three methods were also compared on a real data set. Because our procedure is more straightforward and can be used with arbitrary scores, we believe that it will have many practical applications.
Our approach was inspired by the targetdecoy protein database search strategy widely used to estimate the FDR of peptide identifications by tandem mass spectrometry (Elias and Gygi, 2007). A modified version of this strategy (addition of one to the number of decoys) was proved to control the FDR by us (He, 2013; He et al., 2015)
. The similar approach has been used for FDR control in the setting of variable selection of classical linear regression model
(Barber and Candès, 2015). To our knowledge, our approach presents a first attempt to establish a distributionfree framework for search of random variables different between cases and controls in general casecontrol study. We will discuss the difference of our approach from previous related work at the end of the paper.The rest of the paper is organized as follows. Section 2 discusses a general model for casecontrol study. Our targetdecoy procedure is presented in Section 3. Section 4 establishes the theoretical foundation of our procedure. Numerical results are given in Section 5. In Section 6, we show an application of the targetdecoy procedure on an Arabidopsis microarray data set. Section 7 concludes the paper with a discussion of related and future works. Proofs of theoretical results are provided in the Supplementary Materials.
2 Problem Formulation
Consider a casecontrol study involving random variables, . For each random variable where , there are random samples , where are from the cases and are from the controls.
The goal is to search for random variables different between cases and controls. The null hypothesis for random variable
used here is the ‘symmetrically distributed’ hypothesis: the joint distribution of
is symmetric. In other words, the joint probability density function of
(or the joint probability mass function if are discrete) satisfies for any possible and any permutation of . If are independent, this hypothesis is equivalent to that are identically distributed. Here we use the ‘symmetrically distributed’ hypothesis to deal with the case where are related but still an exchangeable sequence of random variables (Chow and Teicher, 2012).Let be some score to measure the difference between and . Without loss of generality, we assume that large scores indicate significance and holds for any possible where is any permutation of and is any permutation of . Note that neither the null distributions of scores nor the distributions of random variables are required to be known.
3 The targetdecoy procedure
Our procedure for FDR control is as follows.
Algorithm : the targetdecoy procedure

For each test , calculate the target score . Permute the case and control status randomly for times and calculate the scores of the permutations. Sort these scores in descending order. For equal scores, sort them randomly with equal probability.

For each test , calculate a final score and assign it a label , where and stand for target and decoy, respectively. Assume that the rank of is . If , let and set as . If , let and set as the score ranking . Otherwise, , let be or randomly with equal probability and set as .

Sort the tests in descending order of the final scores. For tests with equal scores, sort them randomly with equal probability. Let denote the sorted labels and denote the sorted scores.

If the specified FDR control level is , let
(1) and reject the hypothesis with rank if and .
Section 4 will show the targetdecoy procedure can control the FDR. The random permutation used in our procedure can be generated by simple random sampling either with or without replacement, just as in the permutation tests. Similarly, with larger sampling number , the power of our procedure will be a little stronger as shown in Section 5. We can set as , where is the maximum number of permutations we would perform.
Unlike other FDR control methods, our targetdecoy procedure does not depend on the null distribution. The sampling number of permutations, , used in our procedure can be much smaller than that used in permutation tests. In our simulations, was set as or , while in the real data experiments, was set as . Simulations demonstrate that our targetdecoy procedure can still control the FDR even if was set as and then little information about the null distributions was revealed.
4 Theoretic Analysis
In this section, we will show that equation (1) can control the FDR. Let and denote that the null hypothesis for random variable is true and false, respectively. Note that are constants in the setting of hypothesis testing. Define where as follows. If , let be if and be if . Otherwise, , let be if and be if . Table 1 summarizes the possible situations. Let denote the sorted sequence of .
Let and denote and , respectively. Let and denote and , respectively. We define , , and similarly. For example, we will use to denote a sequence of constants, , which is one of the observed values of . We will also define etc. Then we have the following theorem, the proof of which is given in the Supplementary Materials.
Theorem 1.
In the targetdecoy procedure, if the random variables are independent, then for any fixed and any possible and we have
(2) 
From Table 1 we have if and if . Thus, equation (1) is equivalent to . Meanwhile, the total number of rejected hypotheses is , and the number of falsely rejected hypotheses is . Therefore, the false discovery proportion (FDP) is and the real FDR is . To prove that equation (1) can control the FDR, we only need to prove the following theorem, and the proof is given in the Supplementary Materials.
Theorem 2.
Let random variables , be defined as in Section 3. Then for any , we have
(3) 
Remark. The most intuitive way for FDR control is to set as
(4) 
However, this way cannot control the FDR. Consider a casecontrol study where the specified FDR control level is and all hypotheses are true null. From Theorem 1 we have . If , we have . Therefore, and at least one hypothesis will be rejected. Then the FDP when is . Thus, the FDR is no less than , which is not controlled.
5 Simulation Studies
We used MonteCarlo simulations to study the performance of our method. We considered the casecontrol studies in which the random variables follow the normal distribution, the gamma distribution and the Cauchy distribution, respectively. In addition to the normal distribution, we did simulation experiments for the gamma distribution because many random variables in real world are gammadistributed. Moreover, to test the multiple testing methods with a heavytailed distribution, we did simulation experiments for the Cauchy distribution.
Recall that the casecontrol study consists of random variables. For each random variable, there are random samples, of which are from the cases and the other are from the controls. Let be the random samples for random variable .
The observation values from the normal distribution were generated in a way similar to Benjamini, Krieger, and Yekutieli (2006). First, let be independent and identically distributed random variables following the distribution. Next, let for and . We used and , with corresponding to independence and and corresponding to typical moderate and high correlation values estimated from real microarray data, respectively (Almudevar et al., 2006). The values of are zero for , the controls. For the cases where , the values of are also zero for , the hypotheses that are true null. The values of for and are set as follows. We let and for , respectively. Similarly, we let and for , respectively. This cycle was repeated to produce for the false null hypotheses.
The observation values from the gamma distribution, which is characterized using shape and scale, were generated in the following way. First, let be independent random variables where follows the distribution and follows the distribution for any and . Next, let for and in the simulation study for independent random variables and let for dependent random variables. To obtain reasonable correlation values, was set as 4 and was set as 1 for , the controls. For the cases where , was set as 1 for , the hypotheses that are true null. The values of for and are set as follows. We let and for , respectively. Similarly, we let and for , respectively. This cycle was repeated to produce for the false null hypotheses. The observation values from the Cauchy distribution were generated in a similar way and the details are given in the Supplementary Materials.
The specified FDR control levels were 5 and 10, respectively. The total number of tests, , was set as 10000. The ratio of false null hypotheses was either or . The total sample size, , was set as 20, 60 or 100, and the number of cases and that of controls were always set to the same. We only present the results of experiments for normal and gamma distributions where was 20 in the main text. Other results are given in the Supplementary Materials.
Three different approaches to FDRs were compared in our simulations, i.e., the Bayes method (Storey, 2002, 2003; Storey et al., 2004), the empirical Bayes method (Efron et al., 2001; Efron and Tibshirani, 2002; Efron, 2008) and our targetdecoy procedure. The Bayes method and the empirical Bayes method are among the most remarkable multiple testing methods. To compare the power of these methods, we rejected the hypotheses against specified FDR control levels, . The rejection threshold, , for the Bayes method was set as the largest value such that value() is no more than (Storey, 2002, 2003). The rejection threshold, , for the empirical Bayes method was set as the minimum value such that Efdr() is no more than , where Efdr() is the expected fdr of hypotheses with values no smaller than (Efron, 2007, 2004). Specifically, the R packages ”locfdr” version 1.18 (Efron, 2004), and ”qvalue” version 2.4.2 (Storey and Tibshirani, 2003)
were used. Each simulation experiment was repeated for 1000 times. We calculated the mean number of rejected hypotheses to value the power of each method. The FDRs of rejected hypotheses were calculated by the means of FDPs. Note that the variance of the mean of FDPs of 1000 repetitions is one thousandth of the variance of FDPs. We can also estimate the standard deviation of the mean of FDPs from the sample standard deviation of FDPs.
The values of the Bayes method and the values of the empirical Bayes method were calculated with the Student’s test, Wilcoxon rank sum test, the Student’s test with permutation, or the Student’s test with bootstrap. For the permutation and bootstrap methods, we sampled the cases and the controls for each test, calculated the values for sampled data by test, and calculated the values with the null distribution of pooled values (Xie et al., 2005; Liu and Shao, 2014). For the bootstrap method, the resampling was within the groups. The sampling number of permutations was set as 10 (Efron, 2012) and that of bootstrap was set as 200.
For our targetdecoy procedure, the cases and the controls of each test were permuted for 49 times or only once and the values and the test statistics of the Wilcoxon rank sum test were used. We did the one permutation experiments where little information about the null distributions was revealed to demonstrate that our procedure does not rely on the null distribution. Because the permutation is performed inherently in our targetdecoy procedure, the extra permutation and bootstrap are unnecessary.
We will use abbreviations to represent the experiments. For example, Bayes,permutation,
Normal,10%, represents the simulation experiment where the Bayes method combined with the pooled permutation is used, the random variables follow the normal distribution, the ratio of false null hypotheses is as high as and the correlation values are 0.8. For our targetdecoy procedure, value,49,Gamma,1% represents the simulation experiment where the value is used as the score, 49 permutations are performed for each test, the random variables follow the gamma distribution and the ratio of false null hypotheses is as low as .
5.1 Independent random variables
Normal,1%  Normal,10%  Gamma,1%  Gamma,10%  
0.05  0.10  0.05  0.10  0.05  0.10  0.05  0.10  
Bayes  
test  0.050  0.100  0.050  0.100  0.023  0.048  0.031  0.072 
permutation  0.048  0.099  0.048  0.098  0.027  0.068  0.047  0.103 
ranksum  0.039  0.088  0.039  0.087  0.045  0.087  0.042  0.083 
bootstrap  0.012  0.035  0.034  0.080  0.000  0.001  0.005  0.028 
Empirical Bayes  
test  0.044  0.092  0.040  0.084  0.006  0.013  0.008  0.023 
permutation  0.039  0.078  0.039  0.086  0.048  0.124  0.055  0.119 
ranksum  0.046  0.092  0.037  0.078  0.046  0.091  0.037  0.077 
bootstrap  0.005  0.015  0.022  0.060  0.000  0.000  0.000  0.001 
Targetdecoy  
value,49  0.041  0.094  0.049  0.099  0.043  0.094  0.050  0.100 
value,1  0.044  0.093  0.048  0.097  0.042  0.092  0.047  0.096 
ranksum,49  0.042  0.096  0.049  0.099  0.042  0.096  0.050  0.100 
ranksum,1  0.042  0.093  0.048  0.097  0.042  0.096  0.048  0.097 
Normal,1%  Normal,10%  Gamma,1%  Gamma,10%  
0.05  0.10  0.05  0.10  0.05  0.10  0.05  0.10  
Bayes  
test  71  80  845  937  40  50  687  798 
permutation  71  80  842  933  41  55  737  861 
ranksum  67  76  813  906  48  59  734  836 
bootstrap  60  68  808  902  0  1  494  673 
Empirical Bayes  
test  70  78  823  909  23  32  534  650 
permutation  69  76  821  913  49  66  755  891 
ranksum  69  77  806  889  47  59  715  823 
bootstrap  53  61  772  863  0  0  10  221 
Targetdecoy  
value,49  69  79  843  935  45  60  743  853 
value,1  69  79  841  931  45  60  736  845 
ranksum,49  67  77  834  926  42  60  755  872 
ranksum,1  66  77  831  922  42  60  751  865 
FDR Control. Table 2 shows the real FDRs of different methods with independent random variables while the specified FDR control level was or . The results show that the test with Bayes or empirical Bayes overestimated the FDRs for the gamma distribution. The real FDRs of pooled permutation can significantly exceed when the random variables follow the gamma distribution. The Wilcoxon ranksum test with Bayes or empirical Bayes overestimated the FDRs. The real FDRs of bootstrap were much smaller than . The targetdecoy procedure always controlled the FDR.
Statistical power. Table 3 shows the statistical power of different methods with independent random variables. Bootstrap is less powerful than all the other methods, especially when the random variables follow the gamma distribution.
When the random variables follow the normal distribution, the Bayes method is a little more powerful than the targetdecoy procedure while test is used. However, it is much less powerful than the targetdecoy procedure while the Wilcoxon ranksum test is used. The empirical Bayes method is less powerful than the Bayes method and our targetdecoy procedure, especially for the Normal,10% experiments.
When the random variables follow the gamma distribution, the targetdecoy procedure is much more powerful than the Bayes and empirical Bayes methods, even if only one permutation was performed. Though the pooled permutation seems to be powerful, the FDRs were not controlled.
In our experiments, the targetdecoy procedure always controlled the FDR. Meanwhile, our procedure is very powerful and does not rely on the null distribution. Even if only one permutation was performed, many hypotheses were still rejected while the FDRs were controlled.
5.2 Dependent random variables
Normal,  Normal,  Gamma  
0.05  0.10  0.05  0.10  0.05  0.10  0.05  0.10  0.05  0.10  0.05  0.10  
Bayes  
test  0.052  0.102  0.050  0.100  0.050  0.101  0.050  0.100  0.023  0.048  0.031  0.072 
permutation  0.050  0.100  0.048  0.098  0.049  0.099  0.047  0.098  0.026  0.067  0.047  0.103 
ranksum  0.046  0.088  0.044  0.085  0.038  0.092  0.039  0.083  0.043  0.085  0.042  0.082 
bootstrap  0.014  0.039  0.034  0.081  0.015  0.041  0.035  0.083  0.000  0.000  0.005  0.027 
Empirical Bayes  
test  0.047  0.097  0.044  0.090  0.048  0.100  0.047  0.097  0.006  0.013  0.008  0.023 
permutation  0.042  0.083  0.043  0.093  0.041  0.084  0.045  0.099  0.048  0.123  0.055  0.121 
ranksum  0.049  0.095  0.042  0.086  0.048  0.094  0.046  0.095  0.045  0.090  0.037  0.077 
bootstrap  0.006  0.020  0.026  0.067  0.008  0.024  0.029  0.074  0.000  0.000  0.000  0.001 
Targetdecoy  
value,49  0.047  0.097  0.050  0.100  0.047  0.095  0.049  0.100  0.043  0.094  0.048  0.099 
value,1  0.046  0.096  0.048  0.098  0.045  0.096  0.049  0.100  0.042  0.092  0.047  0.096 
ranksum,49  0.049  0.099  0.049  0.099  0.045  0.096  0.050  0.100  0.042  0.090  0.050  0.100 
ranksum,1  0.048  0.100  0.049  0.099  0.048  0.097  0.050  0.100  0.040  0.089  0.047  0.096 
Normal,  Normal,  Gamma  
0.05  0.10  0.05  0.10  0.05  0.10  0.05  0.10  0.05  0.10  0.05  0.10  
Bayes  
test  82  90  927  1016  101  108  1047  1109  40  50  687  797 
permutation  82  90  922  1012  101  108  1043  1106  42  55  737  861 
ranksum  80  87  907  983  98  106  1031  1086  47  59  735  836 
bootstrap  74  80  893  984  93  99  1028  1087  0  1  493  673 
Empirical Bayes  
test  81  90  914  999  100  108  1043  1105  23  32  536  652 
permutation  81  87  912  1003  99  106  1041  1108  49  66  757  893 
ranksum  80  88  900  984  99  107  1040  1102  47  59  716  823 
bootstrap  69  75  871  958  90  96  1020  1077  0  0  10  221 
Targetdecoy  
value,49  81  90  926  1015  100  108  1046  1109  44  60  741  852 
value,1  81  89  923  1013  100  108  1046  1109  45  60  735  845 
ranksum,49  80  89  917  1007  99  107  1045  1108  42  59  756  870 
ranksum,1  80  89  916  1005  99  107  1044  1108  41  59  749  863 
Similar results are found for dependent random variables.
FDR Control. Table 4 shows the real FDRs of different methods with dependent random variables while the specified FDR control level, , was or . The results show that with the Bayes meothod, the real FDRs of test may sightly exceed in the Normal,1% experiments. Meanwhile, the test with Bayes or empirical Bayes overestimated the FDRs for the gamma distribution. The real FDRs of pooled permutation can significantly exceed when the random variables follow the gamma distribution. The Wilcoxon ranksum test with Bayes or empirical Bayes overestimated the FDRs. The real FDRs of bootstrap were much smaller than . The targetdecoy procedure always controlled the FDR.
Statistical power. Table 5 shows the statistical power of different methods with dependent random variables. Bootstrap is less powerful than all the other methods, especially when the random variables follow the gamma distribution.
When the random variables follow the normal distribution, the Bayes method is less powerful than the targetdecoy procedure while the Wilcoxon ranksum test is used. Though the Bayes method seems to be a little more powerful than the targetdecoy procedure while the test is used, the real FDR of this method may exceed the specified FDR control level. The empirical Bayes method is less powerful than the Bayes method and our targetdecoy procedure in the Normal,10%, experiments.
When the random variables follow the gamma distribution, the targetdecoy procedure is much more powerful than the Bayes and empirical Bayes methods, even if only one permutation was performed. Though the pooled permutation seems to be powerful, the FDRs were not controlled.
In our experiments, the targetdecoy procedure controlled the FDRs with dependent random variables in all cases. Meanwhile, the targetdecoy procedure is very powerful.
6 An Application
In this section, we apply the targetdecoy procedure to an Arabidopsis microarray data set. To determine whether Arabidopsis genes respond to oncogenes encoded by the transferDNA (TDNA) or to bacterial effector proteins codelivered by Agrobacteria into the plant cells, Lee et al. (2009) conducted microarray experiments at h and d after inoculating wounded young Arabidopsis plants with two different Agrobacterium strains, C58 and GV3101. Strain GV3101 is a cognate of strain C58, which only lacks TDNA, but possesses proteinaceous virulence (Vir) factors such as VirD2, VirE2, VirE3 and VirF (Vergunst et al., 2003). Wounded, but uninfected, stalks were served as control. Here we just use the 6d postinoculation data as an example (downloaded from http://www.ncbi.nlm.nih.gov/geo/, GEO accession: GSE14106). The data consisting of 22810 genes were obtained from the C58 infected and control stalks. Both infected and control stalks are with three replicates.
Similar to the simulation experiments, the Bayes method, the empirical Bayes method and our targetdecoy procedure are compared here. The values in the Bayes method and the values in the empirical Bayes method were calculated with the Student’s test, Wilcoxon rank sum test, and the Student’s test with permutation, respectively. The bootstrap method is not compared because the number of tests, 22810, is much larger than the sample size of a test, i.e., 6. For the Bayes method, twotailed tests were used. For the empirical Bayes method, we first transformed the FDR control level to the threshold of local fdr and then identified differentially expressed genes according to the threshold. For the targetdecoy procedure, the absolute values and the test statistics of the Wilcoxon rank sum test were used.
[!htb]
Bayes  

test  
permutation  
ranksum test  
Empirical Bayes  
ttest  
permutation  
ranksum test  
Targetdecoy  
tvalue  
ranksum test 

The R package ‘locfdr’ crashed while the Wilcoxon ranksum test is used.
Because it is unknown which genes are really differentially expressed, the real FDRs can not be computed here. The power of these methods are compared. In fairness, the sampling numbers were set as in all the experiments, including the pooled permutation and the targetdecoy procedure. In other words, all the permutated null scores were generated for each gene.
As shown in Table 6, no differentially expressed genes were found by the empirical Bayes method or the Wilcoxon ranksum test. For the Bayes method, the test is more powerful than the pooled permutation for small () while the pooled permutation is more powerful for large (). The targetdecoy procedure with test is most powerful for . The additional genes identified by the targetdecoy procedure are reliable, because similar numbers, i.e., 785 genes for FDR 0.034, 1427 genes for FDR 0.050 and 2071 genes for FDR 0.065, were reported by a more specific analysis (Tan and Xu, 2014).
7 Discussion
In this paper, we proposed the targetdecoy procedure, a direct approach to FDRs, which is free of the null distribution. We theoretically prove that this approach can control the FDR for independent statistics, and experimentally demonstrate that it is more stable and powerful than two most popular methods. Our procedure can also be extended to the pairmatched casecontrol study by adjusting the permutation subprocedure. Actually, we need to randomly exchange the paired observed values just as the permutation tests for pairmatched study instead of permuting them in Step 1. The other steps and analyses are the same. In the targetdecoy procedure, the scores are only used to determine the labels and ranks of tests, and the statistical meaning of the scores is not required. Similar to permutation tests, the targetdecoy procedure can be used for any test statistic, regardless of whether or not its null distribution is known.
The decoy method was also used in the field of variable selection with a different name “knockoff” for FDR control without null distribution (Barber and Candès, 2015). Given response and covariates, variable selection aims at removing redundant or irrelevant covariates without incurring much loss of information in model construction for response, which is different from the multiple testing problem we discussed here. In the original paper of knockoff, Barber and Candès (Barber and Candès, 2015) considered a Gaussian linear regression model where the number of covariates is no more than the number of observations. Since then, this method has been extended to a wide range of variable selection problems, such as the high dimensional setting where the number of covariates is more than the number of observations (Barber and Candes, 2016) and a nonparametric setting with random covariates (Candes et al., 2018).
The knockoff filter and our targetdecoy procedure are both for FDR control without null distribution. But they aim at different multiple testing problems. The knockoff filter is for variable selection which removes the redundant covariates. Thus, permutation is not suitable for constructing knockoff variables because it cannot maintain the correlation between the original variables (Barber and Candès, 2015). Meanwhile, the FDR control by knockoff filter is based on the normal stochastic error, which is a fundamental assumption of classic linear regression model. However, for search of random variables different between the cases and controls, the impact of correlation on FDR control is greatly reduced. Constructing the decoy scores by permutation results in good performance as shown in our simulations. Meanwhile, the targetdecoy procedure makes use of the natural symmetry of casecontrol study, i.e., the exchangeability between observations of each random variable, and is free of additional assumptions.
The following are some detailed differences between these two methods. Firstly, only one knockoff counterpart is generated for each original variable in knockoff filter and the “antisymmetry property” of statistics is necessary for FDR control. Whereas many decoy scores are generated for a hypothesis in the targetdecoy procedure and the “antisymmetry property” is not required. As shown in our simulation, the targetdecoy procedure is a little more powerful if more decoy scores than one are generated for a hypothesis. Secondly, in the knockoff filter method the expectation of FDP is taken over the stochastic error, which is the only randomness, and the variables with identical original and knockoff statistics will never be selected. Whereas the expectation of FDP is taken over at least four kinds of randomness in our targetdecoy procedure. The first two are the randomness of and the randomness of permutations. The last two are from two special situations, including the randomness of labeling hypotheses with target scores ranking in the middle, i.e., in Step 2, and the randomness of sorting identical scores. The hypotheses with middle target scores are still possible to be rejected. Thirdly, the construction of knockoff variables with the necessary properties for FDR control is usually difficult and the construction methods can be complex, such as in Candes et al. (2018). Our procedure constructs the decoy scores simply by permutation.
The modelX knockoffs also tried to release the constrains of having valid values and can control the FDR of highdimensional variable selection for a nonparametric setting in which the conditional distribution of the response is arbitrary (Candes et al., 2018). This approach requires the covariates be random with a distribution that is known or can be estimated. However, the distribution estimation is unpractical in some studies if the number of observations for each covariate is limited. Our approach is free of this constraint.
There were many studies on the FDR control in multiple testing under dependency (Benjamini and Yekutieli, 2001; Efron, 2007; Ghosal and Roy, 2011; Kang, 2016). In this paper, we only experimentally verify the performance of our targetdecoy procedure on dependent test statistics. The theoretic analysis under dependency needs further research. Moreover, our theoretic analysis is based on the ‘symmetrically distributed’ hypothesis. This null hypothesis is stronger than the hypothesis that the two groups have the same means. The performance of our procedure for the ‘equality of means’ hypothesis needs further studies.
References
 Almudevar et al. (2006) Almudevar, A., L. B. Klebanov, X. Qiu, P. Salzman, and A. Y. Yakovlev (2006). Utility of correlation measures in analysis of gene expression. NeuroRx 3(3), 384–395.
 AriasCastro and Chen (2017) AriasCastro, E. and S. Chen (2017). Distributionfree multiple testing. Electronic Journal of Statistics 11(1), 1983–2001.
 Barber and Candès (2015) Barber, R. F. and E. J. Candès (2015). Controlling the false discovery rate via knockoffs. The Annals of Statistics 43(5), 2055–2085.
 Barber and Candes (2016) Barber, R. F. and E. J. Candes (2016). A knockoff filter for highdimensional selective inference. arXiv preprint arXiv:1602.03574.
 Basu et al. (2017) Basu, P., T. T. Cai, K. Das, and W. Sun (2017). Weighted false discovery rate control in largescale multiple testing. Journal of the American Statistical Association 0(ja), 0–0.
 Benjamini and Hochberg (1995) Benjamini, Y. and Y. Hochberg (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), 289–300.
 Benjamini et al. (2006) Benjamini, Y., A. M. Krieger, and D. Yekutieli (2006). Adaptive linear stepup procedures that control the false discovery rate. Biometrika 93(3), 491–507.
 Benjamini and Yekutieli (2001) Benjamini, Y. and D. Yekutieli (2001). The control of the false discovery rate in multiple testing under dependency. Annals of statistics, 1165–1188.
 Candes et al. (2018) Candes, E., Y. Fan, L. Janson, and J. Lv (2018). Panning for gold: modelx knockoffs for high dimensional controlled variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology).
 Chow and Teicher (2012) Chow, Y. S. and H. Teicher (2012). Probability theory: independence, interchangeability, martingales. Springer Science & Business Media.
 Efron (2004) Efron, B. (2004). Largescale simultaneous hypothesis testing: the choice of a null hypothesis. Journal of the American Statistical Association 99(465), 96–104.
 Efron (2007) Efron, B. (2007). Size, power and false discovery rates. The Annals of Statistics, 1351–1377.
 Efron (2008) Efron, B. (2008). Microarrays, empirical bayes and the twogroups model. Statistical science, 1–22.
 Efron (2012) Efron, B. (2012). Largescale inference: empirical Bayes methods for estimation, testing, and prediction, Volume 1. Cambridge University Press.
 Efron and Tibshirani (2002) Efron, B. and R. Tibshirani (2002). Empirical bayes methods and false discovery rates for microarrays. Genetic epidemiology 23(1), 70–86.
 Efron et al. (2001) Efron, B., R. Tibshirani, J. D. Storey, and V. Tusher (2001). Empirical bayes analysis of a microarray experiment. Journal of the American statistical association 96(456), 1151–1160.
 Elias and Gygi (2007) Elias, J. E. and S. P. Gygi (2007). Targetdecoy search strategy for increased confidence in largescale protein identifications by mass spectrometry. Nature Methods 4(3), 207–214.
 Fan et al. (2007) Fan, J., P. Hall, and Q. Yao (2007). To how many simultaneous hypothesis tests can normal, student’s t or bootstrap calibration be applied? Journal of the American Statistical Association 102(480), 1282–1288.
 Gavrilov et al. (2009) Gavrilov, Y., Y. Benjamini, and S. K. Sarkar (2009). An adaptive stepdown procedure with proven fdr control under independence. The Annals of Statistics, 619–629.
 Ghosal and Roy (2011) Ghosal, S. and A. Roy (2011). Predicting false discovery proportion under dependence. Journal of the American Statistical Association 106(495), 1208–1218.
 Hansen and Kerr (2012) Hansen, E. and K. F. Kerr (2012). A comparison of two classes of methods for estimating false discovery rates in microarray studies. Scientifica 2012.
 He (2013) He, K. (2013). Multiple hypothesis testing methods for largescale peptide identification in computational proteomics. Master’s thesis, University of Chinese Academy of Sciences. http://dpaper.las.ac.cn/Dpaper/detail/detailNew?paperID=20015738. [in Chinese with English abstract].
 He et al. (2015) He, K., Y. Fu, W.F. Zeng, L. Luo, H. Chi, C. Liu, L.Y. Qing, R.X. Sun, and S.M. He (2015). A theoretical foundation of the targetdecoy search strategy for false discovery rate control in proteomics. arXiv preprint arXiv:1501.00537.

Kang (2016)
Kang, M. (2016).
Optimal false discovery rate control with kernel density estimation in a microarray experiment.
Communications in StatisticsSimulation and Computation 45(3), 771–780.  Kerr (2009) Kerr, K. F. (2009). Comments on the analysis of unbalanced microarray data. Bioinformatics 25(16), 2035–2041.
 Lee et al. (2009) Lee, C.W., M. Efetova, J. C. Engelmann, R. Kramell, C. Wasternack, J. LudwigMüller, R. Hedrich, and R. Deeken (2009). Agrobacterium tumefaciens promotes tumor induction by modulating pathogen defense in arabidopsis thaliana. The Plant Cell 21(9), 2948–2962.
 Liu and Shao (2014) Liu, W. and Q.M. Shao (2014). Phase transition and regularized bootstrap in largescale tests with false discovery rate control. The Annals of Statistics 42(5), 2003–2025.
 Sarkar (2002) Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Annals of statistics, 239–257.
 Sarkar et al. (2013) Sarkar, S. K., J. Chen, and W. Guo (2013). Multiple testing in a twostage adaptive design with combination tests controlling fdr. Journal of the American Statistical Association 108(504), 1385–1401.
 Scott and Berger (2010) Scott, J. G. and J. O. Berger (2010). Bayes and empiricalbayes multiplicity adjustment in the variableselection problem. The Annals of Statistics 38(5), 2587–2619.
 Storey (2002) Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64(3), 479–498.
 Storey (2003) Storey, J. D. (2003). The positive false discovery rate: a bayesian interpretation and the qvalue. The Annals of Statistics 31(6), 2013–2035.
 Storey et al. (2004) Storey, J. D., J. E. Taylor, and D. Siegmund (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66(1), 187–205.
 Storey and Tibshirani (2003) Storey, J. D. and R. Tibshirani (2003). Statistical significance for genomewide studies. Proceedings of the National Academy of Sciences 100(16), 9440–9445.
 Strimmer (2008) Strimmer, K. (2008). A unified approach to false discovery rate estimation. BMC bioinformatics 9(1), 303.
 Sun and Cai (2007) Sun, W. and T. T. Cai (2007). Oracle and adaptive compound decision rules for false discovery rate control. Journal of the American Statistical Association 102(479), 901–912.
 Tan and Xu (2014) Tan, Y.D. and H. Xu (2014). A general method for accurate estimation of false discovery rates in identification of differentially expressed genes. Bioinformatics 30(14), 2018–2025.
 Tansey et al. (2017) Tansey, W., O. Koyejo, R. A. Poldrack, and J. G. Scott (2017). False discovery rate smoothing. Journal of the American Statistical Association 0(ja), 0–0.
 Tusher et al. (2001) Tusher, V. G., R. Tibshirani, and G. Chu (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences 98(9), 5116–5121.
 Vergunst et al. (2003) Vergunst, A. C., M. C. van Lier, A. den DulkRas, and P. J. Hooykaas (2003). Recognition of the agrobacterium tumefaciens vire2 translocation signal by the virb/d4 transport system does not require vire1. Plant physiology 133(3), 978–988.
 Xie et al. (2005) Xie, Y., W. Pan, and A. B. Khodursky (2005). A note on using permutationbased false discovery rate estimates to compare different analysis methods for microarray data. Bioinformatics 21(23), 4280–4288.
Comments
There are no comments yet.