1 Introduction
Screening populations for infectious diseases such as human papillomavirus infection (HPV) is important for the effective identification of cervical cancer and its precursors (Nanda et al., 2000). In under-resourced countries individual testing may be expensive, and therefore not feasible. The use of group testing where samples are combined in a single test can lead to cost savings. This paper proposes an optimal design for this type of screening.
Group testing for identification when the probability of disease varies across subjects is a challenging problem in applied statistics and was introduced by
Sobel (1960). Recently in this journal, Black et al. (2015) introduced an algorithm for the generalized group testing problem (GGTP) in a hierarchical class. A procedure is in the hierarchical class (HC) if two units are tested together in a group only if they have an identical test history, i.e., each previous group test either contains both of them or none of them (Sobel and Groll, 1959; Hwang et al., 1981).This note presents a dynamic programming (DP) algorithm that can achieve an optimal solution with respect to the expected number of tests. Although, the development of the algorithm is not simple (see Appendix A), the implementation is very simple. Matlab code is provided to compute the optimal configuration and the expected number of test. We show that the proposed approach is substantial more efficient than the proposed one by Black et al. (2015).
2 An optimal Hierarchical DP Algorithm
We present an optimal Hierarchical dynamic programming (HDP) algorithm for GGTP. We assume without loss of generality that with the corresponding labels , where is the size of the population and is the probability of an infection for each person in the population. The development of the procedure will be with respect to the ordered values of as in Black et al. (2015). This assumption is necessary in order to make the problem tractable, since the alternative is to optimize with respect to all possible permutations of (which is impossible even for small ). The DP algorithm uses a backward induction process, that we fully develop in Appendix A. In the reminder of this section we provide hieratic arguments to explain the algorithm along with some definitions.
By we denote the expected total number of tests under an optimal HDP algorithm applied to the the set of units with labels , which we abbreviate as (Binomial set) with probabilities . The DP algorithm is a backward induction process (Bellman, 1957) where is determined recursively for , with . As a product of the recursive process, we obtain an optimal design. For this recursively computation, we need to introduce additional notation for the expected total number of tests under an optimal HDP algorithm conditional on the information that there is at least one defective element in the group. By we denote this condition expectation applied to the units (Defective set) with probabilities respectively, where .
Let . The HDP algorithm can be implemented as follows.
// denote comment.
Black et al. (2015) presented their algorithm in the context of misclassification among tests. This situation can be easily incorporated in the proposed HDP algorithm 1 by substituting the probability of observing a positive outcome for individual instead of . A short discussion about the issue of misclassification is provided in Section 5.
3 Example
We present numerical examples to demonstrate how an optimal HDP algorithm can be used in practice. Since the minimal group size is equal to 1 or 2, and the case of 1 is clear, we demonstrate first the implementation of the algorithm for .
3.1
We assume that and . Denote by a number of the tests. The left branch of the tree represents the negative test result, and the right branch represents the positive test result.
Algorithm [.test together units a and b [. with prob. ] [.test unit a [. with prob. q_a(1-q_b) ] [.test unit b [. with prob. (1-q_a)q_b ][. with prob. (1-q_a)(1-q_b) ] ] ] ]
3.2
To demonstrate the implementation of the algorithm in more realistic situation, we consider
with probabilities vector
with the corresponding labels set . Initially, an optimal configuration is , i.e., each one of these 3 groups is tested. Subsequent testing is done separately in each of the three groups based on the HDP algorithm. Recall that the left branch of the tree represents the negative test result and the right branch represents the positive test result.group
[.test Stop [.test test : see A [. test 1 and A below [.test 2 Stop test 3 ] algorithm ] ] ]
[.A: test Stop [.test 4 [.test 5 Stop test 6 ] algorithm ] ]
group
[.test Stop [.test 7 [.test 8 Stop test 9 ] algorithm ] ]
group
algorithm .
4 Numerical comparisons with Black et al. (2015)
Under the assumption of no misclassification, we compare the performance of the optimum HGT algorithm with the CRC method proposed by Black et al. (2015). We obtained the expected number of tests, using R functions provided in Black et al. (2015). As done in Black et al. (2015), we generate the vector
from a Beta distribution with parameters
such that , i.e., expectation equal to , and set to . Table 1 presents a comparisons between the CRC and HDP algorithms for , where the CRC design was determined based on the algorithm proposed by Black et al. (2015). The results show that the performance of HDP is much better than CRC with an efficiency gain of approximately .0.01 | |||
---|---|---|---|
0.05 | |||
0.10 | |||
0.15 |
In practice, can be substantially larger than 20. For example with and , the CRC algorithm results in an expected number of tests of as compared with an optimal HDP algorithm, that results in tests (a gain of ).
5 Discussion
This article presents a simple algorithm for the optimal group testing design under a hierarchical class. The goal of this work is to develop group testing screening algorithms that are optimal and can be used in the situation where is large, such as in HPV screening. In this situation, the CRC algorithm is not feasible due to its computational complexity. The HDP algorithm, however, is computational feasible even in very large populations. For and an optimal design based on the HDP algorithm results in an expected number of 288 tests.
As described in Section 2, the algorithm can easily be extended to account for test misclassification. However, group testing for screening under misclassification may need to consider a more complex objective function than the expected number of tests (Graff and Roeloffs, 1972; Malinovsky et al., 2016). Intuitively, depending on the emphasis placed on misclassification versus test-efficiency (error in diagnosing vs expected number of tests), the practitioner needs to minimize a function of these two factors. In this case, the optimal design for HGT is an open problem.
Appendix
Appendix A Development of an optimal HDP algorithm for GGTP
We show here the development of an optimal HDP for GGTP. As we already discussed in Section 2, we impose an order restriction . In the homogeneous case, i.e., an optimal hierarchical DP algorithm was obtained by Sobel and Groll (1959).
Evaluation of
Recall that we are dealing with the binomial set .
We begin with the case . In this case .
For subsequent evaluation, when ,
we have to find the size of of the subset from to test. If the test outcome of is negative, then we test the remaining units that form a binomial set . Otherwise, if the test outcome of is positive, then we test defective set of size , which we abbreviate as , and binomial set .
Recall that, the left branch of the tree represents a negative test result, and the right branch
represents a positive test result.
We summarize these situations in the following binary testing tree.
[.test [ with prob. ] [ with prob. ] ]
Denote by the expected total number of tests. Then,
(1) |
where
Since the optimal value is obtained by choosing the best among , we have
(2) |
Then is calculated in a recursive manner for . This calculation (A) required the conditional expectation , which is developed as follow.
Evaluation of
Recall that we are dealing with the defective set .
If , then . If , we have to find
a proper subset of size from to test.
If the binary test outcome of is negative, then we conclude that the remaining units form a defective set , which does not need to be tested as a whole set.
If the test outcome of is positive, then
conditional posterior distribution
of units is the same as it was before any testing and they form a binomial set
(similar arguments as in Sobel and Groll (1959)).
Therefore, we divide the defective set into two subsets
and and test them separately from left to right.
We have three possible states of these subsets, i.e., ,
where, for example, represent the situation where both subsets are positive.
Denote by the expected total number of tests
corresponding to the situation .
The following diagram represents all these possible outcomes with corresponding conditional (on the event that there is at least one defective element) probabilities.
[ . [.-+ [. pr. ]] [.+- [. pr. ]] [.++ [. pr. ]] ]
Denote by the expected total number of tests in this case. Then,
Since an optimal value is obtained by choosing the best , among we have,
(3) |
Appendix B Matlab Code
function and an optimal design
#function [H D]=HHgDesign(q)
#q=sort(q,’descend’);
#h=hg(q);
#N=length(q);
#H=zeros(N+1,1);
#H(N,1)=1;
#D=zeros(N,1);
#D(N,1)=1;
#I=zeros(N,1);
#I(N,1)=1;
#for nn=1:N-1
#n=N-nn;
#v=zeros(N-(n-1),1);
# for k=1:1:N-n+1
qq=q(n:n+k-1);
# PI=prod(qq);
# v(k,1)=H(n+k)+(1-PI)*h(n,n+k-1);
# end
#m=min(v);
#f=(find(v==m));
#f=min(f);
#H(n)=1+min(m);
#D(n)=f;
#I(n)=N-(n-1);
#end
#D=[I D];
#end
function and an optimal design
#function [h d]=hgDesign(q)
#q=sort(q,’descend’);
#N=length(q);
#h=zeros(N,N);
#d=zeros(N,N);
#for nn=1:1:N-1
#n=N-nn;
#for l=2:1:(N-n+1)
# qq=q(n:(n+l-1));
#PI=prod(qq);
#v=zeros(l-1,1);
#for x=1:1:(l-1)
#qqq=q(n:(n+x-1));
# PII=prod(qqq);
#qqqq=q((n+x):(n+l-1));
#PIII=prod(qqqq);
#v(x,1)=-PII*(1-PIII)/(1-PI)+(1-PII)/(1-PI)*h(n,n+x-1)+(1-PIII)/(1-PI)*h(n+x,n+l-1);
#end
#m=min(v);
#f=find(v==m);
#f=max(f);
#h(n,n+l-1)=2+m;
#d(n,n+l-1)=f;
#end
#end
#hg=h;
#end
References
- Bellman (1957) Bellman, R. (1957). Dynamic Programming. Princeton University Press.
- Black et al. (2015) Black, M. S., Bilder, C. R., Tebbs, J. M. (2015). Optimal retesting configurations for hierarchical group testing. Appl. Statist. 64, 693–710.
- Graff and Roeloffs (1972) Graff, L. E., and Roeloffs, R. (1972). Group testing in the presence of test error: an extension of the Dorfman procedure. Technometrics 14, 113–122.
- Hwang et al. (1981) Hwang, F. K., Pfeifer, C. J., and Enis, P. (1981). An Optimal Hierarchical Procedure for a Modified Binomial Group-Testing Problem. J. Amer. Statist. Assoc. 76, 947–949.
- Malinovsky et al. (2016) Malinovsky, Y., Albert, P. S., and Roy, A. (2016). Reader Reaction: A Note on the Evaluation of Group Testing Algorithms in the Presence of Misclassification. Biometrics 72, 299–304.
- Nanda et al. (2000) Nanda, K., McCrory, D. C., Myers, E. R., Bastian, L. A., Hasselblad, V., Hickey, J. D., Matchar, D. B. (2000). Accuracy of the Papanicolaou test in screening for and follow-up of cervical cytologic abnormalities: a systematic review. Ann.Intern.Med. 132, 810–819.
- Sobel and Groll (1959) Sobel, M., Groll, P. A. (1959). Group testing to eliminate efficiently all defectives in a binomial sample. Bell System Tech. J. 38, 1179–1252.
-
Sobel (1960)
Sobel, M. (1960).
Group testing to classify efficiently all defectives in a binomial sample.
Information and Decision Processes (R. E. Machol, ed.; McGraw-Hill, New York), pp. 127-161.