Group testing is an effective testing method to reduce the number of tests required for identifying defective items by performing tests on the pools of items 
. As the number of the pools is set to be less than that of the items, the number of tests required in the group testing is smaller than that of the items. A mathematical procedure is required to estimate items’ states based on the test results, which is equivalent to solving an underdetermined problem. It is expected that when the prevalence (fraction of the defective items) is sufficiently small, the defective items can be accurately identified using an appropriate inference method, as with sparse estimation[2, 3], wherein an underdetermined problem is solved under the assumption that the number of nonzero (defective in the context of group testing) components in the variables to be estimated is sufficiently small. The reduction in the number of tests reduces the testing costs; hence, the application of group testing to various diseases such as HIV  and hepatitis virus [5, 6] has been discussed. In addition, the need to detect elements in heterogeneous states from a large population is common not only in medical tests but also in various other fields. The group testing matches such demands and is applied to the detection of rare mutations in genetic populations  and the monitoring of exposure to chemical substances .
The accuracy of group testing depends on the pooling and estimation methods. Representative pooling methods are random pooling under a constraint related to the number of pools each item belongs to , and the binary splitting method where the defective pools are sequentially divided into subpools and the subpools are repeatedly tested [10, 11]. From an experimental point of view, a method called shifted transversal for high-throughput screening has been proposed . In addition, a pooling method using a paper-like device for multiple malaria infections has been developed for effective testing 
. Recently, a pooling method using active learning has also been proposed[14, 15].
The estimation problem in group testing has been studied based on the mathematical correspondence between the group testing and coding theory [16, 17]. By considering errors that are inevitable in realistic testing, Bayesian inference has been introduced to the estimation process in group testing for modeling the probabilistic error [18, 19]. The estimation of the items’ states using Bayesian inference is superior to that using binary splitting-based methods in the case of finite error probability. However, the applicability of the theoretic bounds for group testing studied so far are restricted to the parameter region where asymptotic limits are applicable; hence, the theoretical bounds are not necessarily practical for general settings [20, 21].
In contrast to such approaches, we quantify the performance of the Bayesian group testing and understand its applicability as a diagnostic classifier. Considering the practical situation, we examine the no-gold-standard case and introduce a statistical model for group testing. The usage of the statistical model is one of the approaches to understand the test property without a gold standard. The statistical model considered here describes an idealized group testing, and there are no practical tests that completely match this setting. However, as explained in the main text, the Bayesian optimal setting considered here, in which the generative process of the test result is known, can provide a practical guide for group testing.
We consider two quantification methods used in medical tests that output continuous values. The first is the cutoff-dependent property. The basis for applying Bayesian inference to estimate the items’ states, which are discrete, is the posterior distribution, which is a continuous function; hence, a mapping from continuous to discrete variables is needed. In prior studies on Bayesian group testing, the maximum posterior marginal (MPM) estimator, which is equivalent to the cutoff of 0.5, was used for the mapping to determine the items’ states from the posterior distribution [18, 19]. However, there is no mathematical background behind the use of the MPM estimator. We appropriately determine the cutoff in Bayesian group testing using a risk function from the view point of decision theory  or utility theory , and understand the MPM estimator and the maximization of Youden index in the unified framework of the Bayesian decision theory.
The second characterization is the cutoff-independent property using the receiver operating characteristic (ROC) curve [25, 26]. More quantitatively, the area under the curve (AUC) of the ROC curve is used as an indicator of the usefulness of a test . For evaluating the AUC in the no-gold standard case, we apply a method based on statistical physics to the group testing model. The analytical result well describes the actual performance of the belief propagation (BP) algorithm defined for the given data.
The main contributions of our study are as follows.
We show that the expected AUC is maximized under the Bayesian optimal setting when the marginal posterior probability is used as the diagnostic variable.
We show that in the Bayesian optimal setting, Bayes risk function defined by the false positive rate and false negative rate is minimized using the marginal posterior probability and appropriate cutoff.
We derive the distribution of the marginal posterior probability for defective and non-defective items under the Bayesian optimal setting without knowing which items are defective. Using this distribution, we obtain the ROC curve and quantify the AUC. Then, we identify the parameter region in which the group testing with smaller number of tests yields a better identification performance than that of the original test performed on all items.
We demonstrate that the analytical results accurately describe the behavior of BP algorithm employed for a single sample when the number of items is sufficiently large.
The remainder of this paper is organized as follows. In Sec. II, we describe our model for the group testing and Bayesian optimal setting. In Sec. III, we introduce several theorems hold in the Bayesian optimal setting, and derive a general expression for the cutoff corresponding to the risk function. In Sec.IV, the performance evaluation method based on the replica method for the group testing is summarized and the results are presented. In Sec.V, the correspondence between the replica method and the BP algorithm is explained. Sec.VI summarizes this study and explains the considerations regarding the assumptions used in this paper.
Ii Model and settings
Let us denote the number of items as . We consider randomly generated pools under the constraint that the number of items in each pool is , and the number of pools each item belongs to is . Here, we refer to and as the pool size and overlap, respectively, and set them to be sufficiently smaller than . There are pools that satisfy the condition. We label them as and prepare the corresponding variable , which represents pooling method: indicates that the -th pool is tested, whereas indicates that it is not tested. We consider that each pool is not tested more than once, and -tests are performed in total; hence, and hold. From the definition of the group testing, is smaller than , and we set . The set of labels of the items in the -th pool is , and the number of labels in is without dependence on . The true state of all items is denoted by , and that of the items in the -th pool is denoted by . For instance, when the 1st pool contains the 1st, 2nd, and 3rd items, . We introduce the following assumptions in our model of the group testing.
The pools that contain at least one defective item are regarded as positive.
Under this assumption, the true state of the -th pool, denoted by , is given by , where is the logical sum.
Each test result independently obeys the identical distribution.
In addition, we consider that the test property is characterized by the true positive probability and false positive probability . Hence, the true generative process of the test result performed on the -th pool is given by
and the joint distribution of the-test results is given by .
The true positive probability and false positive probability are known in advance.
Following this assumption, the assumed model for the inference is set as .
As the prior knowledge, we introduce the following assumptions.
The prevalence is known.
The pretest probability of all items is set to the prevalence .
Hence, we set the prior distribution as
Following the Bayes’ theorem, the posterior distribution is given by
where is the normalization constant given by
We note that corresponds to the true generative process of the test results in the Bayesian optimal setting. In the problem setting considered here, the assumed model used in the inference matches the true generative process of the test results. We refer to such a setting as Bayes optimal.
Iii Appropriate diagnostic variable and cutoff
In this section, we present a discussion based on Bayesian decision theory for the setting of the diagnostic variable and cutoff.
Iii-a Diagnostic variable for decision
First, we consider the statistic that should be used as a diagnostic variable to determine the items’ states. In this study, we adopt the statistic that is expected to maximize the AUC. We denote the arbitrary statistic for the -th item, which characterizes the estimated item’s state under the pooling method and the test result . The statistic
does not need to be evaluated in the framework of Bayesian estimation but can be defined based on other methods. We use the following expression for the AUC, which is equivalent to the Wilcoxon–Mann–Whitney test statistic:
Furthermore, we define the expected AUC as , where and denote the expectation according to the prior and the likelihood , respectively, and denotes a functional of . In the Bayesian optimal setting, the expected AUC is given by
where is the posterior AUC defined by
We denote the marginal posterior probability under the Bayesian optimal setting as . For simplicity, we consider that the case holds; hence,
As a diagnostic variable, we adopt the statistic that yields the largest . The following theorem suggests the statistic appropriate for the purpose.
The maximum of the posterior AUC (7) is achieved at for any and .
We introduce the order statistic of as , where denotes the index of the component in whose value is the -th smallest. Using the order statistic, (7) for is given by
The difference between and under an arbitrary statistic is given by
where the inequality trivially holds because . Equation (9) indicates that the posterior marginal probability yields the largest value of . ∎
The equality holds when is satisfied for all . In other words, when the sorted as corresponds to the order statistic of , is the maximum, as in the case of the evaluation using the posterior marginal probability under the Bayesian optimal setting. In principle, the statistic not under the Bayesian optimal setting can achieve the maximum of when it satisfies the abovementioned condition. Furthermore, as an example, for also yields the largest value of . In the following, we evaluate the AUC using for simplicity.
The Bayesian optimal setting is the ideal case and is impractical, but indicates the best possible performance of the group testing in the sense that it yields the largest .
Iii-B Determination of cutoff
The adequacy of the marginal posterior probability as a diagnostic variable can be confirmed in the interpretation of the cutoff based on a utility function. We define the utility function for the use of an arbitrary estimator as follows :
where , and , , , and are the true positive rate, false negative rate, false positive rate, and true negative rate, respectively. Following our notations, and are given by
where is the set of parameters given by and , respectively. The risk function evaluates the detrimental effect of the incorrectly estimated results. The maximization of the utility function is equivalent to the minimization of the risk function; hence, we consider the risk minimization. We define an expected risk as . Under the Bayesian optimal setting, the expected risk known as Bayes risk is given by
where is the posterior risk defined as
and and are the posterior FN and posterior FP defined as
respectively. Here, the following relationship holds:
We define the optimal estimator as that minimizes the posterior risk for any and at a given . The optimal estimator yields the minimum expected risk, and the estimator corresponds to the Bayes estimator in the decision theory . The following theorem represents the basis for the cutoff determination.
The optimal estimator is given by a cutoff-based function using the marginal posterior probability in the Bayesian optimal setting as
In general, any function that maps -continuous values to -discrete values can be an estimator for the items’ states, but (20) indicates that the optimal estimator is defined by the cutoff given by the prevalence and the parameters of the risk function and . Furthermore, the form of (20) indicates that the marginal posterior probability is appropriate for the evaluation of the test performance using AUC.
Let us consider the maximization of the Youden index as an example of risk minimization. The Youden index is expressed as follows:
Hence, . Thus, the maximization of the Youden index corresponds to equal reductions in the false negative and false positive. By following Theorem 2, we immediately obtain Corollary 1 by substituting into (20).
The cutoff that equals the prevalence maximizes the posterior Youden index given by (15) for .
Next, let us consider the MPM estimator corresponding to the cutoff of 0.5. Following (20), 0.5 is the optimal cutoff at and , where the risk is equivalent to the mean squared error . This fact is consistent with previous studies in which the optimality of the MPM estimator is supported in terms of the minimization of the expected mean squared error [28, 29, 30]. The risk at and implies that the priority of the decision is determined by the prevalence ; when , the priority is to reduce false positives, and when , the priority is to reduce false negatives. In other words, the use of the MPM estimator indicates that the priority of the decision is to avoid identification errors in larger populations of the non-defective and defective populations. Group testing is effective when the prevalence is sufficiently small; hence, the usage of the MPM estimator in the group testing decreases the false positives rather than the false negatives. If we need to reduce the false negative rate in group testing rather than the false positive rate, such as a test where a low false positive probability is considered, the usage of the MPM estimator may not achieve the purpose; hence, the setting of the appropriate cutoff under the risk is important.
Iii-C Optimal cutoff and Bayes factor
The expression of the estimator with the appropriate cutoff (20
) has correspondence with the Bayes factor. The Bayes factor is defined as the ratio of the marginal likelihoods of two competing models. Here, we focus on the -th item and consider and as the competing ‘models.’ We denote the Bayes factor for the -th item and define it as [32, 33]
where is the marginalized likelihood under the constraint on
. The Bayes factor can be expressed using the posterior oddsand the prior odds as
In particular, in the maximization of the expected Youden index, which corresponds to , the th item is considered defective when , and considered non-defective when . Following the conventional interpretation of the Bayes factor, indicates that the evidence against is ‘not worth more than a bare mention’ [32, 34]. Hence, the maximization of the posterior Youden index provides a loose criterion for deciding . Meanwhile, the MPM estimator, which corresponds to and with a small prevalence provides a strict criterion for deciding . For instance, at , the -th item is regarded as defective when . In the conventional interpretation, indicates that the evidence against is ‘Strong’  or ‘Very Strong’ . Hence, the usage of the MPM estimator at small values of indicates that strong evidence is required to identify the defective items.
As explained in Sec.V, in the BP algorithm, the Bayes factor can be expressed by using the probabilities appearing in the algorithm.
Iv ROC Analysis by replica method
As discussed in the previous section, Bayesian optimal setting maximizes the expected AUC and the expected risk. In this section, we evaluate the expected AUC under the Bayesian optimal setting. The procedure explained herein has been introduced for the analysis of error-correcting codes such as low-density-parity-check codes  and compressed sensing , which have mathematical similarities with the group testing. For deriving the ROC curve and the associated AUC, we need to obtain the distributions of the marginal posterior probability of the non-defective items and the defective items , which are defined as
Assuming that the distributions under the fixed test results , pooling methods , and items’ states converge to the typical distribution at sufficiently large values of , we consider the averaged distribution functions
where denotes the expectation with respect to the randomness (), whose joint distribution is given by
Here, is the set of pool indices where -th item is contained, and is the normalization constant.
We define the conditional expectation of the -th power of the posterior marginal probability as follows:
Using (35), the distribution of the marginal posterior probability is given by
The strategy for obtaining the distributions is based on the reconstruction of the distribution by the moments, for , , as shown in (36)–(37). The calculation methods for and are the same; Hence, we mainly explain the calculation of .
For the expectation with respect to the randomness, we introduce the following identity that holds for :
where we express the -th power by introducing -replicated systems . Using the identity (38), we obtain
We introduce a calculation method known as the replica method for the evaluation of (40). First, assume that and ; hence, . We obtain the following expressions:
where is expressed using the -replicas in (41). We combine with other replica variables; hence, (42) is represented by the -replica variables . The analytical expression of (42) for the integer is analytically continued to the real values of to take the limit in (39). The detailed calculation is presented in the Appendix; here, we briefly explain the basic approach for the calculations.
We introduce an (unnormalized) probability mass function , where
is the vector consisting of the replica variables. Here, it is noted that bothand represent the -replica variables. In the replica method, the latter expression is used in the analysis. As shown in the Appendix, (42) depends on the replica variables only through the function , whose value needs to be determined to be consistent with the weight in (42) for each configuration of . Furthermore, for analytic continuation, we introduce an assumption known as the replica symmetric (RS) assumption, in which the function is invariant against the permutation of the indices of the replica variables except . In this case, the probability mass function can be described using the Bernoulli parameters because
is a binary variable. For the invariance of the replica indices, the Bernoulli parameter needs to be equivalent for all replicas. Here, we set the Bernoulli parameter as. It is natural to consider that the Bernoulli parameter depends on the true state . This consideration and de Finetti’s theorem  indicate that can be expressed in the form of
Here, is satisfied for , and
where . In the RS assumption, the distributions , , need to be determined to be consistent with the weight . This RS assumption and the associated analytic continuation may cause instability of the solution; hence, we need to check the adequacy of our analysis. In Sec.IV-B, the result of the replica method is compared with that of the BP algorithm, and we consider the analysis shown here as adequate for the performance evaluation of the group testing.
Following the calculation shown in the Appendix, the distributions of the marginal posterior probability for the defective and non-defective items are given by
where , and
The function is the conjugate of the function that satisfies for . As shown in the Appendix, for and are derived as