Optimal Multistage Group Testing Algorithm for 3 Defectives

Group testing is a well-known search problem that consists in detecting of s defective members of a set of t samples by carrying out tests on properly chosen subsets of samples. In classical group testing the goal is to find all defective elements by using the minimal possible number of tests in the worst case. In this work, a multistage group testing problem is considered. Our goal is to construct a multistage search procedure, having asymptotically the same number of tests as an adaptive one. We propose a new approach to designing multistage algorithms, which allows us to construct a 5-stage algorithm for finding 3 defectives with the optimal number 3log_2t(1+o(1)) of tests.

Authors

• 13 publications
01/20/2019

A New Algorithm for Two-Stage Group Testing

Group testing is a well-known search problem that consists in detecting ...
02/27/2018

Follow Up on Detecting Deficiencies: An Optimal Group Testing Algorithm

In a recent volume of Mathematics Magazine (Vol. 90, No. 3, June 2017) t...
07/16/2020

Community aware group testing

Group testing pools together diagnostic samples to reduce the number of ...
05/05/2020

Application-oriented mathematical algorithms for group testing

We have a large number of samples and we want to find the infected ones ...
12/23/2021

Heuristic Random Designs for Exact Identification of Defectives Using Single Round Non-adaptive Group Testing and Compressed Sensing

Among the challenges that the COVID-19 pandemic outbreak revealed is the...
08/05/2020

Optimal Pooling Matrix Design for Group Testing with Dilution (Row Degree) Constraints

In this paper, we consider the problem of designing optimal pooling matr...
09/10/2018

Unconstraining graph-constrained group testing

In network tomography, one goal is to identify a small set of failed lin...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The group testing problem was introduced by Dorfman in [2]. Suppose that we have a large set of samples, some of which are defective. Our task is to find all such elements by performing special tests. Each test is carried out on a properly chosen subset of samples. The result of the test is positive if there is at least one defective element in the tested subset; otherwise, the result is negative. In this work consider the noiseless case, i.e., the outcomes are always correct. We aim to design an algorithm that finds all defective elements using as few tests as possible.

Two types of algorithms are usually considered in group testing. Adaptive algorithms can use the results of the previous tests to determine which subset of samples to test at the next step. In non-adaptive algorithms all tests are predetermined and can be carried out in parallel.

In this paper, we consider multistage algorithms, which can be seen as a compromise solution to the group testing problem. An algorithm is divided into stages. Tests from the th stage may depend on the outcomes of the tests from the previous stages.

We consider a problem, in which the total number of defective elements is equal to . Let be the minimal worst-case total number of tests needed to find all defective members of a set of samples using at most stages; stands for the minimal number of tests for adaptive algorithms.

In many applications, it is much cheaper and faster to perform tests in parallel. Unfortunately, non-adaptive algorithms require much more tests than adaptive ones. It is known [3, 4, 5] that for fixed non-adaptive algorithm needs at least tests, whereas with the adaptive algorithm it is sufficient to use only , , tests. Rather surprisingly, for 2-stage algorithms it was proved that tests are already sufficient [6, 7, 8]. This fact emphasizes the importance of multistage algorithms.

In this paper, we are interested in the constant

 Cp(s)=¯¯¯¯¯¯¯¯limt→∞Np(t,s)log2t

for -stage algorithms. For adaptive algorithms this constant is equal to . In general, our aim is to design -stage algorithm, which uses asymptotically the same tests as adaptive one.

I-a Related work

We refer the reader to the monographs [9, 10] for a survey on group testing and its applications. In this paper, only the number of test needed in the worst-case scenario is considered. For the problem of finding the average number of tests in non-adaptive algorithms we refer the reader to [11] for and to [12, 13] for . Also, in paper [14]

the average number of tests for 2-stage algorithms was found in model, where each element is defective with probability

, .

Non-adaptive algorithms for the search of at most defectives can be constructed from -disjunctive (or superimposed) codes [15, 16]. Those codes were also investigated under the name of cover-free families [17]. The best known asymptotic () lower [8] and upper [18] bounds on are as follows

 s24log2s(1+o(1))≤C1(s)≤s22ln2(1+o(1)).

Numerical values for small can be found, for example, in Table 2 in [8]. From these bounds, it follows that for , i.e. it is impossible to construct a non-adaptive algorithm with asymptotically the same number of tests as in adaptive one. Also, it is impossible for ; more precise, in [19] the best lower and upper bounds on were established

 2.0008≤C1(2)≤3.1898.

It is natural to expect that for , too, but it hasn’t been proved yet.

For the case of -stage algorithms, , the only known lower bound is information-theoretic one

 Cp(s)≥s. (1)

Group testing algorithms with 2-stages can be obtained from disjunctive list-decoding codes [16] and selectors [6]. Both approaches provide the bound . In the but best results for disjunctive list-decoding codes give a better constant [18]

 C2(s)≤eln2s(1+o(1)),s→∞. (2)

In recent work [20] with the help of another approach new two-stage algorithm was constructed, which outperforms disjunctive list-decoding codes for fixed , but has the same asymptotic for . However, for the case of 2 defectives 2 stage algorithm from [20] uses tests, i.e. and the algorithm achieves information-theoretic lower bound on the number of tests.

This work continues the research started in papers [21, 20]. We prove by providing new 5-stage algorithm, which finds 3 defectives using the optimal number of tests.

Our approach

To construct a new algorithm we use a hypergraph framework. Informally, we introduce a -uniform hypergraph , each vertex of which represents one sample. Suppose that we have already carried out some tests. We draw a hyperedge for every -element set of samples, which could be equal to the unknown set of defectives, i.e. it agrees with the outcomes of all tests. Such a hypergraph represents all the information we have obtained from the tests so far. In most of the previous works [16, 6, 20], the first stages of algorithms were constructed in such a way that the hypergraph would have only a constant amount of edges. It seems that this condition is excessively strong; it requires too many tests at the first stage. In this paper we use such a set of tests for the first stage that the resulting hypergraph is sparse, i.e. the number of edges in is linear on the number of non-isolated vertices. Employing the sparsity of the hypergraph we explicitly construct subsequent stages to find defectives using approximately tests. This approach gives us optimal algorithms achieving an information-theoretic lower bound on the number of tests for .

I-B Outline

In Section II, we introduce the notation and formally describe the hypergraph approach to the group testing problem in general. As a warm-up, in Section III we apply new idea to the simplest case to construct a 3-stage algorithm, which used tests. The main result of the paper is presented in Section IV, in which the new 5-stage algorithm for finding 3 defectives with an optimal number of tests is described. Section V concludes the paper.

Ii Preliminaries

Throughout the paper we use and for the number of elements and defectives, respectively. By we denote the set . The binary entropy function is defined as usual

 h(x)=−xlog2(x)−(1−x)log2(1−x).

A binary -matrix with rows and columns

 X=∥xi(j)∥,xi(j)=0,1,i∈[N],j∈[t]

is called a binary code of length and size . The number of ’s in the codeword , i.e., , is called the weight of , and parameter , , is the relative weight.

We represent non-adaptive tests with a binary matrix in the following way. An entry equal if and only if th element is included in th test. Let denote the disjunctive sum of binary columns . For any subset

define the binary vector

 r(X,S)=⋁j∈Sx(j),

which later will be called the outcome vector. By , , denote an unknown set of defects.

Ii-a Hypergraph framework

Let us describe the hypergraph approach to the group testing problem. Suppose that we use a binary matrix at the first stage. As a result of performed tests we get the outcome vector . Construct a hypergraph in the following way. The set of vertexes coincides with the set of samples . The set of edges consists of all sets , , such that . In other words, the set of edges of the hypergraph represents all possible defective sets of size . We want to design such a matrix for the first stage of an algorithm that the hypergraph has some good properties, which will allow us to quickly find all defectives at the next few stages.

Previously known algorithms can be described using this terminology. Disjunctive list-decoding codes, selectors and methods from [20] give a binary matrix such that the hypergraph has only a constant amount of edges for all possible outcome vectors . Then we can test all non-isolated vertices individually at the second stage. In the algorithm from [21] the graph has a small chromatic number, which also allows finding defectives quickly.

Iii Algorithm for 2 defectives

For the simplest case we propose a 3-stage algorithm with the optimal number of tests .

Theorem 1.
 C3(2)=2.
Remark 1.

It is known [20] that 2 defectives can be found with the optimal number of tests using only two stages, so, this result is weaker than the result from [20]. We present it here only to demonstrate a new approach in the simplest setup.

Proof of Theorem 1.

Recall that a matching in a graph is a set of non-intersecting edges.

Definition 1.

Call a matrix a 2-good matrix if it satisfies the following properties.

1. For any the maximal vertex degree in a graph is less than .

2. For any , , the maximal size of matching in a graph is less than , where , .

Lemma 1.

Let and

 N=d+4dlog2t1h(p)−p. (3)

Let be a random matrix, each column of which is taken independently and uniformly from the set of all columns with ones. Then the probability that the matrix is 2-good tends to 1 as .

Proof.

Estimate the probability that for some , , there exists a vertex with degree at least in graph . This probability can be upper bounded by the mathematical expectation of the number of sets of edges , , which is less than

 N∑w=0(Nw)td+1⎛⎜⎝(⌊pN⌋w−⌊pN⌋)(N⌊pN⌋)⎞⎟⎠d≤td+3maxw⎛⎜⎝(⌊pN⌋w−⌊pN⌋)(N⌊pN⌋)⎞⎟⎠d

In the first inequality we used the fact that for big enough.

In the similar way estimate the probability that for some , , there exists a matching of size in graph . Mathematical expectation of the number of such matchings is upper bounded by

 N∑w=0(Nw)t2M/M!qM

Use 2-good matrix as a testing matrix at the first stage. Consider an obtained graph . We want to find a partition of all edges into disjoint sets such that

1. There is no intersecting edges in the same set, i.e. if , and , then .

2. There is no edges and in the same set, such that there exists an edge , which intersects both and , i.e. if , , , , , then .

The degree of every vertex is less than ; therefore, each edge can’t be in the same set with less than other edges. Hence, we can construct such partition greedily for .

At the second stage, we carry out tests. For each two sets of vertices and are tested. The set consists of all vertices incident to edges from , set is equal to . We claim that the responses to tests and are equal to 1 and 0 respectively if and only if the set of defectives coincides with an edge from . Indeed, if then outcomes are equals to 1 and 0. Otherwise, can intersect at most one edge from , therefore, the result of test is positive.

So, after the second stage, we will find a set , which contains the defective edge. We can treat each edge from this set as a separate sample, only one of which is defective. Therefore, the defective edge can be found at the third stage using at most tests. This step finishes the algorithm.

The total number of tests is upper bounded by . Let us estimate the cardinality of .

Consider some maximal matching in the graph . Every edge is incident to at least one vertex from this matching, the degree of each vertex is less than , therefore,

 |E|≤20dmax(N,t2q).

Since we conclude that . Hence, it is sufficient to show that

 N+log2(t2q)≤2log2t(1+o(1)).

Indeed,

 N+log2(t2q)=2log2t+N+log2q≤2log2t+N+Nmaxω(ωh(p/ω)+ph((ω−p)/p)−2h(p)+o(1))

The expression attains its maximum -1 at . Therefore, the total number of tests is at most

 2log2t+o(log2t).

Theorem 1 is proved.

Iv Algorithm for 3 defectives

Theorem 2.
 C5(3)=3.

We prove Theorem 2 by presenting a new algorithm for finding 3 defectives, which used tests. It is a first multistage algorithm for with the optimal number of tests. The best previously known algorithm [20] used approximately tests. Omitted proofs of Lemmas can be found in the full version of this paper [1].

Proof of Theorem 2.

To construct a matrix for the first stage of our algorithm we must introduce some useful terminology. Fix an integer and consider a -uniform hypergraph . Call the set of edges a -bad configuration of size if , , for any and . In other words, -bad configuration of size consists of edges such that the intersection of every two edges is the same set of size .

We construct a matrix for the first stage of our algorithm randomly. More precisely, we take a binary matrix of size , in which each column is taken independently and uniformly from the set of all columns with ones. Let be equal to the probability that the union of columns from such ensemble equals to a fixed vector with ones. Let be equal to the probability that the union of columns with a fixed column of weight equals to a fixed vector of weight , .

Definition 2.

Call a matrix a 3-good matrix if it satisfies the following list of properties.

1. Hypergraph doesn’t have -bad configurations of size for any vector .

2. Hypergraph doesn’t have -bad configurations of size for any vector , .

3. Hypergraph doesn’t have -bad configurations of size for any vector , .

4. Let and be two binary vectors of length , , , . For any such vectors and the number of columns in such that is less than , where is defined as follows

 B(N,t)=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩tPr2(1,w1,w), if tPr2(1,w1,w)>N;N, if t−1√L1≤tPr2(1,w1,w)≤N;L1/10, if tPr2(1,w1,w)
5. Let be a binary vector of length , , is some integer, . Then the number of non-intersecting pairs of columns , from matrix such that , , is less than .

Define and as follows

 A1(s,ω)=limN→∞−log2Pr1(s,⌊ωN⌋)N; (4)
 A2(s,ω1,ω)=limN→∞−log2Pr2(s,⌊ω1N⌋,⌊ωN⌋)N (5)
Lemma 2.

Let and

 N=2L1+10L1log2tmaxp≤ω≤3p1A2(2,p,ω). (6)

Let be a random matrix, each column of which is taken independently and uniformly from the set of all columns with ones. Then the probability that the matrix is good tends to 1 as .

We use a good matrix as a testing matrix at the first stage of out algorithm. Consider an obtained hypergraph . Introduce a new graph . The set of vertices coincides with the set of samples. Two vertices and are connected with an edge if there exists at least edges from hypergraph , such that and .

Lemma 3.

The degree of every vertex in the graph is less than .

Divide all edges of the hypergraph into two groups and , . We put an edge into if it contains an edge from as a subset; otherwise, we put an edge into . Note that the hypergraph can’t contain a -bad configuration of size .

Lemma 4.

The following two claims hold.

1. The degree of each vertex in , i.e. the number of edges containing one vertex, is at most .

2. The number of edges in is less than the size of the biggest -bad configuration multiplied by .

For every edge we choose one vertex such that . Call that vertex an additional vertex of the edge . If there are multiple ways to choose such vertex, we do it arbitrarily.

Introduce a new directed graph . For every edge , with additional vertex we add to 4 arcs , , , . If an arc has already been in , we don’t add it second time, i.e. there is no multi-edges in .

Lemma 5.

The out-degree in the graph is less than .

At the second stage we want to check whether the set of defectives lies in or .

Lemma 6.

There exists a partition of into disjoint sets such that

1. There is no intersecting edges in the same set, i.e. if , and , then .

2. There is no edges and in the same set, such that there exists an edge , which intersects both and , i.e. if , , , , , then .

Remark 2.

We emphasize that in the second condition the edge is not necessarily from , it can be from as well.

Second stage consists of tests. For each two sets of vertices and are tested. In the first tested set we include all vertices , which belongs to some edge . In the second test all other vertices are included, .

If the unknown set of defectives coincide with some edge , then the outcomes of tests and are and respectively. Otherwise, the outcomes are different. The first claim is obvious. To prove the second claim note that can’t intersect two edges from by Lemma 6; therefore, it can’t be a subset of , which means that the outcomes can’t be equal to 1 and 0 respectively.

So, we have 2 cases.

1. There is an integer such that the outcomes for tests and are equal to and respectively.

In that case . Then we can think about each edge from as a separate sample. This set of samples contains exactly one defective element , which can be found by using a binary search algorithm.

To sum up, in this case we have used 3 stages and tests.

2. There is no integer such that the outcomes for tests and are equal to and respectively.

It means that coincides with some edge in . Recall the graph . Let be a set of all isolated vertices in . By Lemma 5 the out-degree of every vertex in this graph is less than , therefore, it is possible to partition the set of all non-isolated vertices into disjoint sets , such that there is no arc inside one set . There is an arc in at least one direction between any two vertices from the edge , hence, 3 vertices , , will be placed in 3 different sets.

At the third stage, we test each set separately. We will obtain exactly 3 positive outcomes at this stage. Without loss of generality assume that the tested set has given a positive result. This set contains exactly 1 defective element. At the fourth stage find this vertex using tests. Denote this vertex as .

Vertex is an additional vertex for edges . By Lemma 5 . Also, vertex belongs to edges , , is an additional vertex for the edge . Define sets of vertices , , . At the fifth stage we perform tests. Each element of and is tested separately; binary search is performed on to find one defective element. If the vertex is additional in the edge , then two others defectives will be found in . Otherwise, at least one defective elements would be found in . If there is exactly one defective in , the last one will be found in . This stage completes the algorithm.

To sum up, in this case we have used 5 stages and at most . Recall that , ; by Lemma 5, by Lemma 3, ; therefore, the total number of tests is upper bounded by .

The following Lemma finishes the proof of Theorem 2.

Lemma 7.
 N+log2|E2|=3log2t(1+o(1)); (7)
 N+2log2|E1|<3log2t(1+o(1)). (8)

V Conclusion

A new approach to construct multistage group testing procedures was considered. It allows to design 3-stage and 5-stage algorithms with optimal values of and for the cases and respectively. The algorithm with the optimal number of tests for was obtained for the first time.

The natural open problem is to generalize this approach to the case to construct algorithms with . Another possible direction is to prove upper bound on the rate , , which is stronger than information-theoretic bound .

Vi Acknowledgement

I. Vorobyev was supported by RFBR through grant no. 18-31-00361 MOL_A.

References

• [1] I. Vorobyev, “Optimal multistage group testing algorithm for 3 defectives,” arXiv preprint, 2020.
• [2] R. Dorfman, “The detection of defective members of large populations,” The Annals of Mathematical Statistics, vol. 14, no. 4, pp. 436–440, 1943.
• [3] A. G. D’yachkov and V. V. Rykov, “Bounds on the length of disjunctive codes,” Problemy Peredachi Informatsii, vol. 18, no. 3, pp. 7–13, 1982.
• [4] M. Ruszinkó, “On the upper bound of the size of the r-cover-free families,” Journal of Combinatorial Theory, Series A, vol. 66, no. 2, pp. 302–310, 1994.
• [5] Z. Füredi, “On r-cover-free families,” Journal of Combinatorial Theory, Series A, vol. 73, no. 1, pp. 172–173, 1996.
• [6] A. De Bonis, L. Gasieniec, and U. Vaccaro, “Optimal two-stage algorithms for group testing problems,” SIAM Journal on Computing, vol. 34, no. 5, pp. 1253–1270, 2005.
• [7] A. Rashad, “Random coding bounds on the rate for list-decoding superimposed codes,” Problems of Control and Information Theory, vol. 19, no. 2, pp. 141–149, 1990.
• [8] A. G. D’yachkov, “Lectures on designing screening experiments,” arXiv preprint arXiv:1401.7505, 2014.
• [9] D. Du, F. K. Hwang, and F. Hwang, Combinatorial group testing and its applications.   World Scientific, 2000, vol. 12.
• [10] F. Cicalese, Fault-Tolerant Search Algorithms, ser. Monographs in Theoretical Computer Science. An EATCS Series.   Springer Berlin Heidelberg, 2013.
• [11] V. L. Freidlina, “On a design problem for screening experiments,” Theory of Probability & Its Applications, vol. 20, no. 1, pp. 102–115, 1975.
• [12] M. Mézard and C. Toninelli, “Group testing with random pools: Optimal two-stage algorithms,” IEEE Transactions on Information Theory, vol. 57, no. 3, pp. 1736–1745, 2011.
• [13] O. Johnson, M. Aldridge, and J. Scarlett, “Performance of group testing algorithms with near-constant tests per item,” IEEE Transactions on Information Theory, vol. 65, no. 2, pp. 707–723, 2019.
• [14] T. Berger and V. I. Levenshtein, “Asymptotic efficiency of two-stage disjunctive testing,” IEEE Transactions on Information Theory, vol. 48, no. 7, pp. 1741–1749, 2002.
• [15] W. Kautz and R. Singleton, “Nonrandom binary superimposed codes,” IEEE Transactions on Information Theory, vol. 10, no. 4, pp. 363–377, 1964.
• [16] A. G. Dyachkov and V. V. Rykov, “A survey of superimposed code theory,” Problems of Control and Information Theory, vol. 12, no. 4, pp. 1–13, 1983.
• [17] P. Erdős, P. Frankl, and Z. Füredi, “Families of finite sets in which no set is covered by the union of r others,” Israel J. Math, vol. 51, no. 1-2, pp. 79–89, 1985.
• [18] A. G. D’yachkov, I. V. Vorob’ev, N. Polyansky, and V. Y. Shchukin, “Bounds on the rate of disjunctive codes,” Problems of Information Transmission, vol. 50, no. 1, pp. 27–56, 2014.
• [19] D. Coppersmith and J. B. Shearer, “New bounds for union-free families of sets,” the electronic journal of combinatorics, vol. 5, no. 1, p. 39, 1998.
• [20] I. Vorobyev, “A new algorithm for two-stage group testing,” in 2019 IEEE International Symposium on Information Theory (ISIT), July 2019, pp. 101–105.
• [21] A. G. D’yachkov, I. V. Vorobyev, N. Polyanskii, and V. Y. Shchukin, “On a hypergraph approach to multistage group testing problems,” in Information Theory (ISIT), 2016 IEEE International Symposium on.   IEEE, 2016, pp. 1183–1191.

Appendix A Proofs of Lemmas

Proof of Lemma 2.

Denote an event that a property from Definition 2, , is violated as . Let estimate from above probabilities of these events.

1. Fix a vector , , . Denote an event that we have a -bad configuration of size in as . Let

be a random variable equals to the number of

-bad configurations of size . We upper bound the probability of by the mathematical expectation of , i.e. . To estimate we represent it as a sum of indicators corresponding to all possible -bad configurations of size .

 EYy

For , , therefore, ;

 Pr(B1)≤2NmaxyEYy
2. Let be a random variable equals to the number of -bad configurations of size ; then .

 EYy<(t3M)(3M)!M!(3!)M(Pr1(3,w))M ≤t3MM!6M(Pr1(3,w))M<(t3Pr1(3,w)e6M)M ≤(e60)M<(e60)N.
3. The proof of is analogous to the proof of .

4. Fix two binary vectors and of length such that , , . Let be a random variable equals to the number of columns in such that .

 Pr(B4)≤4Nmaxy1,yPr(Yy,y1>10B(N,t)) ≤4Nmaxy1,yt10B(N,t)(10B(N,t))!(Pr2(1,w1,w))10B(N,t) ≤4Nmaxy1,y(tPr2(1,w1,w)e10B(N,t))10B(N,t).
1. If , then and , hence

 4Nmaxy1,y(tPr2(1,w1,w)e10B(N,t))10B(N,t) ≤4N(e10)10N→0.
2. If , then and , hence

 4Nmaxy1,y(tPr2(1,w1,w)e10B(N,t))10B(N,t) ≤4Nmaxy1,y(tPr2(1,w1,w))10B(N,t) ≤4Nt−L1√L1
5. Let be a random variable equals to the number of sets of cardinality , consisting of pairs of non-intersecting columns , from matrix such that , ; then .

 N2Nmaxy,w1EYy,w1

Proof of Lemma 3.

Seeking for a contradiction assume that vertex has degree at least in graph . It means that there exist vertices , and edges , , . We show that in this case there is a -bad configuration of size ; more precise, it is possible to find a set of edges such that any two of these edges have only one common vertex . Indeed, we can construct such set by choosing edges one by one from to . At each step we have candidates, with at most of which are prohibited; the existence of such -bad configuration contradicts the first property from definition 2. ∎

Proof of Lemma 4.

We prove the first claim by contradiction. Let be a vertex with a degree at least . Consider edges . Construct a maximal -bad configuration , consisting of these edges. Its size is less than by the property 1 from Definition 2. Consider pairs of vertices . Every edge contains at least one such pair as a subset by construction of configuration. From the other hand, no pair can be included in edges. Therefore, , which is a contradiction.

The second claim is immediate consequence of the first one. Indeed, consider the biggest -bad configuration in hypergraph . Say it has the cardinality . Every edge of has at least one common vertex with such configuration, therefore, the total number of edges in is at most