Semiquantitative Group Testing in at Most Two Rounds

Semiquantitative group testing (SQGT) is a pooling method in which the test outcomes represent bounded intervals for the number of defectives. Alternatively, it may be viewed as an adder channel with quantized outputs. SQGT represents a natural choice for Covid-19 group testing as it allows for a straightforward interpretation of the cycle threshold values produced by polymerase chain reactions (PCR). Prior work on SQGT did not address the need for adaptive testing with a small number of rounds as required in practice. We propose conceptually simple methods for 2-round and nonadaptive SQGT that significantly improve upon existing schemes by using ideas on nonbinary measurement matrices based on expander graphs and list-disjunct matrices.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

07/06/2020

Two-Stage Adaptive Pooling with RT-qPCR for COVID-19 Screening

We propose two-stage adaptive pooling schemes, 2-STAP and 2-STAMP, for d...
01/03/2020

Improved non-adaptive algorithms for threshold group testing with a gap

The basic goal of threshold group testing is to identify up to d defecti...
11/10/2020

AC-DC: Amplification Curve Diagnostics for Covid-19 Group Testing

The first part of the paper presents a review of the gold-standard testi...
05/06/2020

A comparison of group testing architectures for COVID-19 testing

An important component of every country's COVID-19 response is fast and ...
02/09/2022

Group testing via residuation and partial geometries

The motivation for this paper comes from the ongoing SARS-CoV-2 Pandemic...
06/15/2020

Combinatorial Group Testing and Sparse Recovery Schemes with Near-Optimal Decoding Time

In the long-studied problem of combinatorial group testing, one is asked...
08/05/2020

Optimal Pooling Matrix Design for Group Testing with Dilution (Row Degree) Constraints

In this paper, we consider the problem of designing optimal pooling matr...

I Introduction

Group testing (GT) is a scheme designed to efficiently identify a small set of subjects with a particular property (standardly referred to as defectives) within a large population, first introduced by Dorfman [1] and further studied in many other works, including [2, 3, 4]. Group testing entails testing a collection of carefully selected subpopulations and reporting for each subgroup a binary answer: A positive answer is indicative of the existence of at least one defective in the subgroup while a negative answer implies the absence of defectives. Given that screening protocols are extensively used in engineering and science, group testing has found wide-spread applications in communication theory, signal processing, computer science, and computational biology [5, 3].

Many different variants of group testing have been proposed in the literature [3, 1, 6]. These include threshold group testing proposed by Damaschke [7] and quantitative (additive) group testing studied by Lindstróm and Du and Hwang [8, 9, 6]. In the latter case, the test results report the exact number of defectives in the test subpool. In the former case, if the number of defectives in a test is smaller than a lower threshold, the test outcome is negative; if the number of defectives is larger than an upper threshold, the test outcome is positive; otherwise, the result is arbitrary (positive or negative). To bridge the two above described paradigms, Emad and Milenkovic [10, 11, 12] introduced the notion of semiquantitative group testing (SQGT). SQGT represents a unifying framework of a number of testing protocols, including conventional, quantitative and gapless threshold group testing and the schemes by D’yachkov and Rykov [13, 14]. In SQGT, the result of a test is a nonbinary value that depends on the number of defectives through a fixed set of thresholds. The SQGT model may also be viewed as a quantitative group testing method followed by a quantizer. The original motivation for introducing SQGT models is genotyping; more recently, the model has been used by Gabrys et al. [15] to describe the test outcomes of a Covid-19 testing process known as real-time reverse-transcriptase polymerase chain reaction (PCR).

In nonadaptive SQGT, each subject is assigned a unique binary or nonbinary indicator word of length equal to the total number of tests. These indicators are arranged column-wise in a test matrix. Each coordinate in the codeword assigned to a subject corresponds to a test, and its value reflects the “concentration” of the sample corresponding to the given subject in the test. Note that the concentrations are nonnegative integers that usually correspond to the number of units of the genetic material of an individual subject. Two families of nonadaptive SQGT codes, SQ-disjunct and SQ-separable, were analyzed in [11, 12]. In the same work, a number of constructions for nonadaptive uniform and nonuniform (quantized) SQGT codes were presented but no results were reported for adaptive tests. The more recent work [15] introduced the first combinatorial and probabilistic adaptive SQGT (ASQGT) schemes, the former extending the work of Hwang [16] on generalized binary group testing. The proposed combinatorial ASQGT schemes involve what is referred to as parallel and deep search methods that lead to a relatively large number of testing rounds. This is an undesirable feature for practical implementations of SQGT in Covid-19 testing.

Here, we describe the first known combinatorial two-round adaptive SQGT (ASQGT) for a special selection of (quantization) thresholds studied in [15]. The scheme uses tests for subjects, defectives and SQGT thresholds. It builds upon the ideas of list-disjunct group testing [17] and like the approach [15] uses nonbinary test matrices obtained by careful linear combining of the rows of a binary disjunct matrix. The described two-round ASQGT protocol differs from the information-theoretic bound only by about a factor . We then proceed to improve existing nonadaptive protocols by extending the construction of Porat and Rothschild [18].

The paper is organized as follows. Sections III describes our main result, the first known two-round ASQGT. Section IV presents new nonadaptive SQGT schemes that significantly improve upon previous constructions [11, 12] and imply new upper bounds for nonadaptive SQGT.

Ii Terminology, GT Background, and Bounds

We start with some relevant terminology. All parameters are denoted by small-case letters, while vectors and matrices are denoted by bold-face small-case and capitalized Latin letters, respectively. Entries of the vectors are indexed by subscripts while matrix entries are indexed by pairs of integers within parentheses. Unless stated otherwise, all

s are to base .

Assume that there are test subjects labeled by elements in among which are defective (i.e, infected). In conventional group testing, we summarize the set of tests through a binary matrix in which every column of the matrix uniquely characterizes an individual and each row represents a test. The entry of , equals  if and only if the individual labeled is included in the test. Let denote the binary vector that results from tests using , assuming that the set of infected individuals equals , with . Whenever clear from the context, we omit the subscript . In conventional group testing if and only if the test includes at least one element from . Let be defined analogously for another set . We say that a set is consistent with if entrywise.

The matrix is termed -disjunct if no vector for contains in its support a column of not indexed by . The disjunctness property ensures that the test results obtained from uniquely identify the set of defectives. A matrix is termed -list-disjunct if the tests output a superset of the defectives of size at most ; for such a matrix, the size of any list consistent with is at most . Clearly, a matrix which is -disjunct is equivalent to one which is -list-disjunct. The notion of list-disjunct matrices was explicitly formulated (in an equivalent form) in [19] and is also essentially equivalent to what was defined earlier in [20].

We review the following known results pertaining to the existence of -list-disjunct test matrices with and . First, note that it is straightforward to see that for a maximal one has . Therefore, as noted in [21], the existence of a -list-disjunct test matrix with naturally implies a two-round testing scheme: The first round of tests is governed by the rows of while the second round involves individually testing subjects in . Randomized and explicit constructions of list-disjunct matrices exist, particularly via expander graphs [21, 14, 17, 20, 19]. The best known construction which achieves an optimal number of rows and nearly linear time recovery (in the number of rows) is given by [22].

The best lower bound on the number of tests necessary for an adaptive ASQGT scheme was established in [15] via a simple counting argument and the bound equals . In the next section, we establish the existence of a two-round scheme that differs from this lower bound by a factor of only. For the single-round setting, using a variation of the argument employed by Füredi [23] in the context of cover-free codes, one can show that the corresponding number of tests scales as whereas the construction from Section IV implies the existence of a scheme that requires at most tests. This lower bound applies to not only general nonadaptive SQGT, but in fact the particular saturation model as well, which is the focus of this work. The derivation of the bound is relegated to the full version of the paper.

Iii Two-Round ASQGT

Let be a bipartite graph with a vertex partition (people) and (tests) such that every vertex in has degree (i.e., neighbors) and , . We say that is an -expander if every of size at most has at least neighbors in . The values of the parameters are dictated by the expansion factors . It is also worth pointing out that explicit constructions of expander graphs with parameters of interest in our derivations may not be known, but their existence is guaranteed via probabilistic arguments. We say that a set of vertices is covered by a set if for every vertex , there exists a vertex which is connected to . We say that a vertex is uniquely covered (or a unique neighbor) of if it is the neighbor of exactly one vertex . Henceforth, for a set of vertices , let denote the neighbors of and let denote the set of unique neighbors of . Furthermore, we say that a vertex is covered times by if it is connected to exactly different vertices in . The next results may be obtained through a straightforward modification of existing results.

Lemma 1

. [19] Suppose that is an -expander where every vertex in has neighbors and . Let be a subset of size at most . Then for any such that , and , we have:

Thus, given the previous lemma, it follows that there exists at least one test in that is not covered by an element of . Using this observation, we construct the binary matrix as follows. Suppose that is an expander as previously described. We assume that the vertices in and are lexicographically ordered so that we can refer to the vertex in as and the vertex in as . Then, for and ,

(1)

Thus, as a result of the construction for , we see that we can uniquely associate each column of with a vertex in and each row of with a vertex in .

The next two results follow immediately from the previous discussion.

Corollary 2

. Suppose we are given two sets such that . If is consistent with under and , then

Lemma 3

. Suppose is as defined in (1) and the set of infected individuals satisfies . Then, testing with recovers a set such that and .

The following lemma is known [3] and follows from a standard randomized construction:

Lemma 4

. Suppose that and let , where is the base of the natural logarithm. Then, for there exists an -expander graph with bipartition such that and , and .

The previous result implies the following theorem.

Theorem 5

. There exists a conventional two-stage GT scheme that requires at most

tests and can identify a set of infected individuals of size at most from a population of size .

We remark that the best known explicit constructions of bipartite expanders are still inferior to the optimal bounds achieved by random expanders in Lemma 4. For example, using [24] one can get tests for any fixed , and [25] would achieve tests, similar to the derivation in [20].

We now discuss how to use the matrix to design a specialized two-round SQGT testing scheme for .

We focus on a special case of uniform SQGT with saturation [15] for which we are given thresholds. The test outcome vector for a set of defectives is such that if the test includes no defectives, if the test includes defective, , if the test includes defectives and if the number of defectives in the test exceeds . To simplify the notation, we assume that for some positive integer .

We show the existence of a two-round testing scheme that differs from the information theoretic lower bound from [15] by only a factor of roughly . As discussed earlier, we only focus on the first round, since the second one is straightforward. The key idea used to construct the test matrix for the first round is to start with list-disjunct expander-based binary test matrix and then merge the rows via specialized linear combinations to reduce the number of tests and increase the size of the alphabet used for the codebook.

We start by introducing two matrices and that will be subsequently concatenated into the “global” SQGT matrix . Let be as defined in (1) and for simplicity, assume that . Then, for and , we set

(2)
(3)

Note that both and are obtained linear combination of rows of B, but the scaling factors are different. The SQGT test matrix S has rows and consequently the same number of tests. The tests involve taking an integer number of sample units dictated by the nonbinary entries in S. The nonbinary (semi-quantitative) test outcome vector will be denoted by .

Let denote the -ary expansion of the natural number in vector form. More precisely, if , then , where . Our decoding procedure operates as follows. Suppose that represents the results of the (quantized) testing using the matrix (2). We apply the map to entrywise. We then use an expander-based decoding procedure on this vector to recover a “noisy” set of test values - the “noise” is due to the that the matrix can handle only up to defectives.

To this end, let and let . Note that if and zero otherwise. For shorthand, we write We have the following claim.

Claim 6

. Let denote the test output based on the binary matrix , let be the test output generated via and let be as defined above.Then,

Proof:

Let be the mapping that corresponds to For some , let vertex be covered times. Such a vertex may be in error (due to the use of the -ary expansion). Since the set has at most neighbors in , it follows from an averaging argument that

Let be a vertex in which is covered at least times (if no such vertex exists, we are error-free and do not have to prove anything further). In this case we may have ; in the worst case . This implies that for every there are at most instances where , which gives the desired result. ∎

As a result of the previous lemma, it follows that we can recover a binary vector that is within Hamming distance of the binary test result based on . Thus, we have to recover the set of infected individuals given a noisy set of test outcomes. To correct errors, we make use of the test outcome generated by the matrix ; this matrix renders the errors in “asymmetric,” which simplifies the problem. Here, the term “asymmetric” refers to the fact to be addressed in Claim 7 that so that in a can change to a but not otherwise. More precisely, we use to identify tests in that contain defectives. Note that if at least infected individuals are present in some test pool , then the entries indexed by of may be in error.

Let be the test outcomes of . Define a vector

(4)

Similarly as before, for we write

The following straightforward claim follows from the previous discussion and the observations in Claim 6.

Claim 7

. Let . Then, , and

We next generate a list of potentially infected individuals consistent with the outcome of the tests . The next lemma, which uses the same ideas as Lemma 1, describes an upper bound on the size of .

Lemma 8

. Suppose that is the result of the tests in (2) and (3) and . Then the size of any list of defectives from consistent with is at most .

Proof:

Recall that in our setup the graph , which is used to construct and also , is an -expander. Hence, every vertex in has neighbors and . As before, let denote the set of infected individuals such that . Let be the output of the tests dictated by . We show that given a such that and , cannot be consistent with under .

Let . Using the same arguments as in the proof of Lemma 1, we can show that the number of unique neighbors of satisfies

Let . Since and from Claim 7, it follows that

which implies that if , then there exists a unique neighbor of which is not in error and is also not already covered by an element in . This implies that is not consistent with . ∎

The following theorem follows from the previous discussion and from Claim 6 and Lemma 8.

Theorem 9

. There exists a nonbinary two-stage GT scheme that given thresholds and

tests that can identify a set of infected individuals of size at most in a population of size .

Proof:

We prove the result by describing a simple method for recovering a set of size which contains the set of defectives . First, we generate the vector from our non-binary test outcomes. We initialize . Then, for every , if is consistent with , we update . Otherwise, we do not change . At the end of this process we have . Furthermore, according to Lemma 8, . The result now follows from Theorem 5. ∎

Iv Nonadaptive SQGT

We describe next constructive nonadaptive testing schemes, which in the asymptotic regime require at most tests, with . Our approach builds upon the construction by Porat and Rothschild (PR construction) [18], which makes use of non-binary error-correcting codes. Our key result is described in Lemma 13.

Let be a -ary linear error-correcting code, where

is an odd prime, of minimum distance

, , and dimension . The PR construction works by uniquely associating each individual in the population of size with a codeword in . Under this setup, the test matrix is defined as

where is the -th codeword of . In words, the test indexed by contains the codewords (individuals) from whose -th coordinate equals .

Our approach for designing a nonadaptive testing scheme is similar to that for the adaptive setting. Each test can be generated by taking a linear combination of rows of . The total number of tests equals , and once again the tests are represented by where is defined as follows:

In words, the test in indexed by contains the codewords (individuals) from whose -th coordinate has a value between and the minimum of . Note that the reason for using the minimum in the previous range of values is a consequence of the fact that we assumed to be an odd prime. The tests in are defined similarly: Suppose that where . Then,

For shorthand, we refer to the codewords in the -th test in as .

Claim 10

. Suppose that the number of infected individuals in the test indexed by is at most so that

Then, given the output of the test we can uniquely determine

for .

Let denote the all-ones vector of length . We assume that our code is such that . Henceforth, let

(5)

for all .

Claim 11

. Let be such that . Suppose that for an integer we have

Then,

Proof:

This follows since if , then for some . If , then for some . Since , it follows that where . This in turn implies that is the value of component of a vector from the set . ∎

We also need the following result.

Claim 12

. Suppose that is such that . If there exists an index satisfying

(6)

and

(7)

then given the output of the tests dictated by , we can determine that .

Proof:

From Claim 11 and if (6) holds, we have that . Then from Claim 10, since the number of infected individuals in is at most , we have using the test outputs of . ∎

Lemma 13

. If has minimum distance , the tests uniquely determine the set of defectives.

Proof:

According to Claim 12, we need to show that (6) and (7) hold for any . We start by showing that (7) holds. In particular, we show a stronger claim that there exists a set of size at least where for any , we have

(8)

where . Note that this implies that the number of coordinates of which agree in value with an element of is at most . Since any two elements in can agree in at most coordinates and , it follows that

Next, we show that for at least one coordinate in , (6) holds as well. First, note that

so that for a randomly chosen coordinate ,

Invoking Markov’s inequality we get

Therefore, it follows that there exists a set of coordinates of size at least such that for any

Since and , it follows that . Letting we have and . By Claim 12, we conclude that . ∎

Open Problems. Despite only a small gap remaining between the lower bound and the actual constructions for the saturation model, many other problems remain open and include:

  • Extending the nonadaptive and two-round constructions for general quantization thresholds under the SQGT model;

  • Deriving bounds and test strategies for consecutive defective models [26, 27], as these capture the order of arrivals into testing queues;

  • Addressing generalized binomial SQGT algorithms [28].

References

  • [1] R. Dorfman, “The detection of defective members of large populations,” Annals of Mathematical Statistics, vol. 14, pp. 436–440, 1943.
  • [2] W. Kautz and R. Singleton, “Nonrandom binary superimposed codes,” IEEE Transactions on Information Theory, vol. 10, pp. 363–377, 1964.
  • [3] D.-Z. Du and F.-K. Hwang, Pooling Designs and Nonadaptive Group Testing.   World Scientific, 2006.
  • [4] H. A. Inan, P. Kairouz, M. Wootters, and A. Özgür, “On the optimality of the Kautz-Singleton construction in probabilistic group testing,” IEEE Transactions on Information Theory, vol. 65, no. 9, pp. 5592–5603, 2019.
  • [5] J. Wolf, “Born again group testing: Multiaccess communications,” IEEE Transactions on Information Theory, vol. 31, no. 2, pp. 185–191, 1985.
  • [6] A. Dyachkov, “Lectures on designing screening experiments,” 2004, lecture Note Series 10.
  • [7] P. Damaschke, “Threshold group testing,” in General Theory of Information Transfer and Combinatorics, ser. Lecture Notes in Computer Science, vol. 4123, 2006, pp. 707–718.
  • [8] B. Lindstrom, “Determining subsets by unramified experiments,” A Survey of Statistical Design and Linear Models, 1975.
  • [9] D.-Z. Du and F. Hwang, Combinatorial Group Testing and its Applications, 2nd ed.   World Scientific, 2000.
  • [10] A. Emad, J. Shen, and O. Milenkovic, “Symmetric group testing and superimposed codes,” in 2011 IEEE Information Theory Workshop, 2011, pp. 20–24.
  • [11] A. Emad and O. Milenkovic, “Semiquantitative group testing,” IEEE Transactions on Information Theory, vol. 60, no. 8, pp. 4614–4636, 2014.
  • [12] ——, “Code construction and decoding algorithms for semi-quantitative group testing with nonuniform thresholds,” IEEE Transactions on Information Theory, vol. 62, no. 4, pp. 1674–1687, 2016.
  • [13] A. G. D’yachkov and V. V. Rykov, “A coding model for a multiple-access adder channel,” Probl. Perdachi Inform. , pp. 26–32, 1981, in Russian.
  • [14] A. Dyachkov and V. Rykov, “A survey of superimposed code theory,” Problems of Control and Information Theory, vol. 12, no. 4, pp. 229–242, 1983.
  • [15] R. Gabrys, S. Pattabiraman, V. Rana, J. ao Ribeiro, M. Cheraghchi, V. Guruswami, and O. Milenkovic, “AC-DC: Amplification curve diagnostics for Covid-19 group testing,” 2020, arXiv:2011.05223.
  • [16] F. Hwang, “A generalized binomial group testing problem,” Journal of the American Statistical Association, vol. 70, no. 352, pp. 923–926, 1975.
  • [17] H. Q. Ngo, E. Porat, and A. Rudra, “Efficiently decodable error-correcting list disjunct matrices and applications,” in International Colloquium on Automata, Languages, and Programming.   Springer, 2011, pp. 557–568.
  • [18] E. Porat and A. Rothschild, “Explicit nonadaptive combinatorial group testing schemes,” IEEE Transactions on Information Theory, vol. 57, no. 12, pp. 7982–7989, 2011.
  • [19] P. Indyk, H. Q. Ngo, and A. Rudra, “Efficiently decodable non-adaptive group testing,” in Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms.   SIAM, 2010, pp. 1126–1142.
  • [20] M. Cheraghchi, “Noise-resilient group testing: Limitations and constructions,” Discrete Applied Mathematics, vol. 161, no. 1, pp. 81–95, 2013, preliminary version in Proceedings of the FCT 2009. arXiv manuscript published in 2008.
  • [21] A. De Bonis, L. Gasieniec, and U. Vaccaro, “Optimal two-stage algorithms for group testing problems,” SIAM Journal on Computing, vol. 34, no. 5, pp. 1253–1270, 2005.
  • [22] M. Cheraghchi and V. Nakos, “Combinatorial group testing schemes with near-optimal decoding time,” in Proceedings of the 61st Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2020.
  • [23] Z. Füredi, “On -cover-free families,” Journal of Combinatorial Theory, Series A, vol. 73, no. 1, pp. 172–173, 1996.
  • [24] V. Guruswami, C. Umans, and S. Vadhan, “Unbalanced expanders and randomness extractors from Parvaresh-Vardy codes,” Journal of the ACM, vol. 56, no. 4, 2009.
  • [25] M. Capalbo, O. Reingold, S. Vadhan, and A. Wigderson, “Randomness conductors and constant-degree expansion beyond the degree/2 barrier,” in Proceedings of the

    th Annual ACM Symposium on Theory of Computing (STOC)

    , 2002, pp. 659–668.
  • [26] T. V. Bui, M. Cheraghchi, and T. D. Nguyen, “Improved algorithms for non-adaptive group testing with consecutive positives,” arXiv preprint arXiv:2101.11294, 2021.
  • [27] C. J. Colbourn, “Group testing for consecutive positives,” Annals of Combinatorics, vol. 3, no. 1, pp. 37–41, 1999.
  • [28] F. Hwang, “A generalized binomial group testing problem,” Journal of the American Statistical Association, vol. 70, no. 352, pp. 923–926, 1975.