A note on optimal design for hierarchical generalized group testing

08/09/2018
by   Yaakov Malinovsky, et al.
0

Choosing an optimal strategy for hierarchical group testing is an important problem for practitioners who are interested in disease screening under limited resources. For example, when screening for infection diseases in large populations, it is important to use algorithms that minimize the cost of potentially expensive assays. Black et al.(2015) described this as an intractable problem unless the number of individuals to screen is small. They proposed an approximation to an optimal strategy that is difficult to implement for large population sizes. In this note, we develop an optimal design that can be obtained using a novel dynamic programming algorithm. We show that this algorithm is substantially more efficient than the approach proposed by Black et al.(2015). The resulting algorithm is simple to implement and Matlab code is presented for applied statistician.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

02/06/2019

Information-theoretic and algorithmic thresholds for group testing

In the group testing problem we aim to identify a small number of infect...
04/24/2020

Optimal group testing under real world restrictions

In the group testing problem one aims to infer a small set of k infected...
04/09/2020

Is Group Testing Ready for Prime-time in Disease Identification?

Large scale disease screening is a complicated process in which high cos...
05/22/2013

PAWL-Forced Simulated Tempering

In this short note, we show how the parallel adaptive Wang-Landau (PAWL)...
11/13/2020

Group design in group testing for COVID-19 : A French case-study

Group testing is a screening strategy that involves dividing a populatio...
11/01/2020

Screening for an Infectious Disease as a Problem in Stochastic Control

There has been much recent interest in screening populations for an infe...
01/30/2020

Optimal selection on X+Y simplified with layer-ordered heaps

Selection on the Cartesian sum, A+B, is a classic and important problem....

1 Introduction

Screening populations for infectious diseases such as human papillomavirus infection (HPV) is important for the effective identification of cervical cancer and its precursors (Nanda et al., 2000). In under-resourced countries individual testing may be expensive, and therefore not feasible. The use of group testing where samples are combined in a single test can lead to cost savings. This paper proposes an optimal design for this type of screening.

Group testing for identification when the probability of disease varies across subjects is a challenging problem in applied statistics and was introduced by

Sobel (1960). Recently in this journal, Black et al. (2015) introduced an algorithm for the generalized group testing problem (GGTP) in a hierarchical class. A procedure is in the hierarchical class (HC) if two units are tested together in a group only if they have an identical test history, i.e., each previous group test either contains both of them or none of them (Sobel and Groll, 1959; Hwang et al., 1981).

This note presents a dynamic programming (DP) algorithm that can achieve an optimal solution with respect to the expected number of tests. Although, the development of the algorithm is not simple (see Appendix A), the implementation is very simple. Matlab code is provided to compute the optimal configuration and the expected number of test. We show that the proposed approach is substantial more efficient than the proposed one by Black et al. (2015).

2 An optimal Hierarchical DP Algorithm

We present an optimal Hierarchical dynamic programming (HDP) algorithm for GGTP. We assume without loss of generality that with the corresponding labels , where is the size of the population and is the probability of an infection for each person in the population. The development of the procedure will be with respect to the ordered values of as in Black et al. (2015). This assumption is necessary in order to make the problem tractable, since the alternative is to optimize with respect to all possible permutations of (which is impossible even for small ). The DP algorithm uses a backward induction process, that we fully develop in Appendix A. In the reminder of this section we provide hieratic arguments to explain the algorithm along with some definitions.

By we denote the expected total number of tests under an optimal HDP algorithm applied to the the set of units with labels , which we abbreviate as (Binomial set) with probabilities . The DP algorithm is a backward induction process (Bellman, 1957) where is determined recursively for , with . As a product of the recursive process, we obtain an optimal design. For this recursively computation, we need to introduce additional notation for the expected total number of tests under an optimal HDP algorithm conditional on the information that there is at least one defective element in the group. By we denote this condition expectation applied to the units (Defective set) with probabilities respectively, where .

Let . The HDP algorithm can be implemented as follows.

Input : 
Initial Values : 
1 for n:=N-1 to 1 step 1 do  do
2       for k:=1 to N-(n-1) step 1 do do
3             for x=1: to k-1 step 1 do  do
4                  
5             end for
               // optimal value
               // From the Defective set is optimal to test first items
6            
7       end for
         // optimal value
         // From the Binomial set is optimal to test first items
8      
9 end for
Algorithm 1 An optimal HDP algorithm: design and the value of

// denote comment.

Black et al. (2015) presented their algorithm in the context of misclassification among tests. This situation can be easily incorporated in the proposed HDP algorithm 1 by substituting the probability of observing a positive outcome for individual instead of . A short discussion about the issue of misclassification is provided in Section 5.

3 Example

We present numerical examples to demonstrate how an optimal HDP algorithm can be used in practice. Since the minimal group size is equal to 1 or 2, and the case of 1 is clear, we demonstrate first the implementation of the algorithm for .

3.1

We assume that and . Denote by a number of the tests. The left branch of the tree represents the negative test result, and the right branch represents the positive test result.

Algorithm [.test together units a and b [. with prob. ] [.test unit a [. with prob. q_a(1-q_b) ] [.test unit b [. with prob. (1-q_a)q_b ][. with prob. (1-q_a)(1-q_b) ] ] ] ]

3.2

To demonstrate the implementation of the algorithm in more realistic situation, we consider

with probabilities vector

with the corresponding labels set . Initially, an optimal configuration is , i.e., each one of these 3 groups is tested. Subsequent testing is done separately in each of the three groups based on the HDP algorithm. Recall that the left branch of the tree represents the negative test result and the right branch represents the positive test result.

group

[.test Stop [.test test : see A [.      test 1 and A below [.test 2 Stop test 3 ] algorithm ] ] ]

[.A: test Stop [.test 4 [.test 5 Stop test 6 ] algorithm ] ]

group

[.test Stop [.test 7 [.test 8 Stop test 9 ] algorithm ] ]

group
algorithm .

4 Numerical comparisons with Black et al. (2015)

Under the assumption of no misclassification, we compare the performance of the optimum HGT algorithm with the CRC method proposed by Black et al. (2015). We obtained the expected number of tests, using R functions provided in Black et al. (2015). As done in Black et al. (2015), we generate the vector

from a Beta distribution with parameters

such that , i.e., expectation equal to , and set to . Table 1 presents a comparisons between the CRC and HDP algorithms for , where the CRC design was determined based on the algorithm proposed by Black et al. (2015). The results show that the performance of HDP is much better than CRC with an efficiency gain of approximately .

0.01
0.05
0.10
0.15
Table 1: Comparison of the expected number of tests between CRC and HDP.

In practice, can be substantially larger than 20. For example with and , the CRC algorithm results in an expected number of tests of as compared with an optimal HDP algorithm, that results in tests (a gain of ).

5 Discussion

This article presents a simple algorithm for the optimal group testing design under a hierarchical class. The goal of this work is to develop group testing screening algorithms that are optimal and can be used in the situation where is large, such as in HPV screening. In this situation, the CRC algorithm is not feasible due to its computational complexity. The HDP algorithm, however, is computational feasible even in very large populations. For and an optimal design based on the HDP algorithm results in an expected number of 288 tests.

As described in Section 2, the algorithm can easily be extended to account for test misclassification. However, group testing for screening under misclassification may need to consider a more complex objective function than the expected number of tests (Graff and Roeloffs, 1972; Malinovsky et al., 2016). Intuitively, depending on the emphasis placed on misclassification versus test-efficiency (error in diagnosing vs expected number of tests), the practitioner needs to minimize a function of these two factors. In this case, the optimal design for HGT is an open problem.

Appendix

Appendix A Development of an optimal HDP algorithm for GGTP

We show here the development of an optimal HDP for GGTP. As we already discussed in Section 2, we impose an order restriction . In the homogeneous case, i.e., an optimal hierarchical DP algorithm was obtained by Sobel and Groll (1959).

Evaluation of
Recall that we are dealing with the binomial set . We begin with the case . In this case . For subsequent evaluation, when , we have to find the size of of the subset from to test. If the test outcome of is negative, then we test the remaining units that form a binomial set . Otherwise, if the test outcome of is positive, then we test defective set of size , which we abbreviate as , and binomial set . Recall that, the left branch of the tree represents a negative test result, and the right branch represents a positive test result. We summarize these situations in the following binary testing tree.

[.test [ with prob. ] [ with prob. ] ]

Denote by the expected total number of tests. Then,

(1)

where
Since the optimal value is obtained by choosing the best among , we have

(2)

Then is calculated in a recursive manner for . This calculation (A) required the conditional expectation , which is developed as follow.

Evaluation of
Recall that we are dealing with the defective set . If , then . If , we have to find a proper subset of size from to test. If the binary test outcome of is negative, then we conclude that the remaining units form a defective set , which does not need to be tested as a whole set. If the test outcome of is positive, then conditional posterior distribution of units is the same as it was before any testing and they form a binomial set (similar arguments as in Sobel and Groll (1959)). Therefore, we divide the defective set into two subsets and and test them separately from left to right. We have three possible states of these subsets, i.e., , where, for example, represent the situation where both subsets are positive. Denote by the expected total number of tests corresponding to the situation . The following diagram represents all these possible outcomes with corresponding conditional (on the event that there is at least one defective element) probabilities.

[ . [.-+ [. pr. ]] [.+- [. pr. ]] [.++ [. pr. ]] ]

Denote by the expected total number of tests in this case. Then,

Since an optimal value is obtained by choosing the best , among we have,

(3)

Combining (2) and (3), we obtain an optimal ordered HDP algorithm:

(4)

where .

Appendix B Matlab Code


function and an optimal design
#function [H D]=HHgDesign(q)
#q=sort(q,’descend’);
#h=hg(q);
#N=length(q);
#H=zeros(N+1,1);
#H(N,1)=1;
#D=zeros(N,1);
#D(N,1)=1;
#I=zeros(N,1);
#I(N,1)=1;

#for nn=1:N-1
#n=N-nn;
#v=zeros(N-(n-1),1);
# for k=1:1:N-n+1
qq=q(n:n+k-1);
# PI=prod(qq);
# v(k,1)=H(n+k)+(1-PI)*h(n,n+k-1);
# end

#m=min(v);
#f=(find(v==m));
#f=min(f);
#H(n)=1+min(m);
#D(n)=f;
#I(n)=N-(n-1);

#end

#D=[I D];
#end

function and an optimal design
#function [h d]=hgDesign(q)
#q=sort(q,’descend’);
#N=length(q);
#h=zeros(N,N);
#d=zeros(N,N);

#for nn=1:1:N-1
#n=N-nn;
#for l=2:1:(N-n+1)
# qq=q(n:(n+l-1));
#PI=prod(qq);
#v=zeros(l-1,1);
#for x=1:1:(l-1)
#qqq=q(n:(n+x-1));
# PII=prod(qqq);
#qqqq=q((n+x):(n+l-1));
#PIII=prod(qqqq); #v(x,1)=-PII*(1-PIII)/(1-PI)+(1-PII)/(1-PI)*h(n,n+x-1)+(1-PIII)/(1-PI)*h(n+x,n+l-1);
#end
#m=min(v);
#f=find(v==m);
#f=max(f);
#h(n,n+l-1)=2+m;
#d(n,n+l-1)=f;
#end
#end
#hg=h;
#end

References

  • Bellman (1957) Bellman, R. (1957). Dynamic Programming. Princeton University Press.
  • Black et al. (2015) Black, M. S., Bilder, C. R., Tebbs, J. M. (2015). Optimal retesting configurations for hierarchical group testing. Appl. Statist. 64, 693–710.
  • Graff and Roeloffs (1972) Graff, L. E., and Roeloffs, R. (1972). Group testing in the presence of test error: an extension of the Dorfman procedure. Technometrics 14, 113–122.
  • Hwang et al. (1981) Hwang, F. K., Pfeifer, C. J., and Enis, P. (1981). An Optimal Hierarchical Procedure for a Modified Binomial Group-Testing Problem. J. Amer. Statist. Assoc. 76, 947–949.
  • Malinovsky et al. (2016) Malinovsky, Y., Albert, P. S., and Roy, A. (2016). Reader Reaction: A Note on the Evaluation of Group Testing Algorithms in the Presence of Misclassification. Biometrics 72, 299–304.
  • Nanda et al. (2000) Nanda, K., McCrory, D. C., Myers, E. R., Bastian, L. A., Hasselblad, V., Hickey, J. D., Matchar, D. B. (2000). Accuracy of the Papanicolaou test in screening for and follow-up of cervical cytologic abnormalities: a systematic review. Ann.Intern.Med. 132, 810–819.
  • Sobel and Groll (1959) Sobel, M., Groll, P. A. (1959). Group testing to eliminate efficiently all defectives in a binomial sample. Bell System Tech. J. 38, 1179–1252.
  • Sobel (1960) Sobel, M. (1960).

    Group testing to classify efficiently all defectives in a binomial sample.

    Information and Decision Processes (R. E. Machol, ed.; McGraw-Hill, New York), pp. 127-161.