Implementing Stepped Pooled Testing for Rapid COVID-19 Detection

by   Abhishek Srivastava, et al.
cornell university

COVID-19, a viral respiratory pandemic, has rapidly spread throughout the globe. Large scale and rapid testing of the population is required to contain the disease, but such testing is prohibitive in terms of resources, cost and time. Recently RT-PCR based pooled testing has emerged as a promising way to boost testing efficiency. We introduce a stepped pooled testing strategy, a probability driven approach which significantly reduces the number of tests required to identify infected individuals in a large population. Our comprehensive methodology incorporates the effect of false negative and positive rates to accurately determine not only the efficiency of pooling but also it's accuracy. Under various plausible scenarios, we show that this approach significantly reduces the cost of testing and also reduces the effective false positive rate of tests when compared to a strategy of testing every individual of a population. We also outline an optimization strategy to obtain the pool size that maximizes the efficiency of pooling given the diagnostic protocol parameters and local infection conditions.



page 1

page 2

page 3

page 4


Evaluation of Pool-based Testing Approaches to Enable Population-wide Screening for COVID-19

Background: Rapid testing for an infection is paramount during a pandemi...

Modeling and Computation of High Efficiency and Efficacy Multi-Step Batch Testing for Infectious Diseases

We propose a mathematical model based on probability theory to optimize ...

DOPE: D-Optimal Pooling Experimental design with application for SARS-CoV-2 screening

Testing individuals for the presence of severe acute respiratory syndrom...

Near-Optimal Pool Testing under Urgency Constraints

Detection of rare traits or diseases in a large population is challengin...

Testing the efficacy of epidemic testing

The cataclysmic contagion based calamity – Covid-19 has shown us a clear...

Efficient identification of infected sub-population

When testing for infections, the standard method is to test each subject...

Group Testing During the COVID-19 Pandemic: Optimal Group Size Selection and Prevalence Control

Group testing pools multiple samples together and performs tests on thes...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

cov19, a viral infectious respiratory illness, has recently emerged as a major threat to public health and economic stability in countries around the world. It has spread globally at an alarming pace and who has declared it a pandemic. In absence of a cure or a vaccine, large scale testing and quarantine is recognized as one of the most effective strategies for containing its spread. While there are various known diagnostic methods for cov19 including nucleic acid testing, protein testing and computed tomography Udugama et al. (2020), they can be extremely prohibitive in terms of cost and time. Pooled testing is a promising strategy to boost testing efficiency. In pooled testing, several samples from each patient are divided and grouped into various pools and the pool is then tested for the disease. If the pool tests negative, each sample of the pool must be negative too. This basic idea reduces the overall cost and time of testing large populations.

Pooled testing was first proposed during World War II Dorfman (1943) and has been a part of diagnostic methodology ever since Bilder and Tebbs (2012). It has since been employed several times to test for infections ranging including Malaria Taylor et al. (2010), Flu Arnold et al. (2013) and HIV Litvak et al. (1994); Nguyen et al. (2019). One of the first implementations of laboratory pooled testing for cov19 was demonstrated by Yelin et al Yelin et al. (2020) for pools as large as 32 or 64 samples. Today, Physicians and Public Health Officials from India HT Correspondent (2020); Goswami (2020); Verma (2020); Perappadan (2020); Press Trust of India (2020a, b); Kaunain Sheriff M (2020) and many other countries around the globe Technion (2020); Jeffay (2020); Stone (2020); Goethe University Frankfurt (2020); Ghana Web (2020) are using pooled testing for determine the spread of this pandemic in a rapid and cost efficient manner.

A variety of different strategies have been proposed over the past several years to implement pooled testing de Wolff et al. (2020); Theagarajan (2020)

. They can be broadly classified into two types: adaptive and non-adaptive. Adaptive methods 

Theagarajan (2020); Shani-Narkiss et al. (2020); Noriega and Samore (2020); Bergel (2020); Zhu et al. (2020); Eberhardt et al. (2020); Narayanan et al. (2020); Ben-Ami et al. (2020); Hanel and Thurner (2020) employ a sequential testing approach, thus requiring fewer number of total tests but more time as each step of testing informs the next. On the other hand, non-adaptive pooled testing methods Ben-Ami et al. (2020); Täufer (2020); Sinnott-Armstrong et al. (2020) usually involve a matrix type pooling that allows for simultaneous testing of several pools whose results are then collated to pinpoint to infected samples. These methods are faster but can require a greater number of tests in total. While many of these methods might be mathematically efficient, their practical implementation is usually challenging Yelin et al. (2020); Ben-Ami et al. (2020) and limits the complexity that can be incorporated, no matter the benefits. Hence, it is imperative to modify and verify any proposed method according to clinical constraints.

Here, we present a probability driven pooled testing approach that can significantly reduce the number of tests required to identify infected patients in large populations. The method divides and tests pools of samples in a hierarchical (stepped) manner. This approach is general enough to not be limited to cov19 alone and can be applied to other infectious scenarios with minor modifications. The mathematical model used for implementing and optimizing this strategy is presented along with representative results for various probable real-life scenarios. Under various plausible scenarios, this strategy reduces the cost of testing between to compared to a strategy of individually testing everyone in a population and cuts the false positive rate up to one-third of an individual test.

It can be used to rapidly determine the efficiency boost that can be obtained by pooling a desired number of samples together if we know the accuracy of testing method and the rate of infection in the population being tested. It can also suggest optimal pool size that should be used to minimize the number of tests needed per 1000 people.

Parameter Name
Pool (group) size
Fraction of the Population infected
Inputs False positives rate for a test
False negatives rate for a test
Maximum number of tests possible per patient
Outputs Efficiency Amplification Factor compared to testing individually. (This is also the average effective number of people that can be tested per test)
False Negative rate for the complete stepped pooled testing (Different from )
Table 1: List of parameters used for Analysis and Results

Ii Methodology

The stepped pooled testing strategy is applicable to any testing method that involves sample collection such as the rtpcr Udugama et al. (2020) test which is being widely used for testing cov19. We begin by assuming that the sample(s) collected from the patients are enough for  tests only (for instance if we are able to collect swabs per patient then ). This number will determines the number of steps of the stepped pooled testing strategy.

Our strategy extends the -step model described in Hanel and Thurner (2020). The stepped pooled testing strategy goes as follows:

  1. We test a pool of samples.

  2. If the outcome is negative (not infected) we can surmise that all the samples in the pool are infection free.

  3. If the pooled sample is tested positive (infected), we split the samples from these patients into two sub pools of size each and repeat steps 1 and 2. It should be noted that at every step of this process we need to use a fresh sample from the patient to make new sub-pools because the sample from the previous step is not reusable.

  4. This process is repeated times, after which we are left with a single sample of the patients in the sub-pools. If a sub-pool at this stage yields positive for infection, we individually test every patient in this sub-pool.

It can be observed that this strategy is most effective when the the pool size is an integer multiple of . The initial size of the pool can be optimized to maximize the effective number of people tested per test or equivalently, minimize the number of tests needed per people. A flowchart for this strategy is shown in Fig. 1.

Figure 1:

Flowchart (Decision Tree) representing the testing method for a pool of

samples. If a pool sample is tested negative , the procedure is stopped for that pool sample.

Probabilistic calculations along this tree enable us to estimate the expected number of tests to be done for a pool of given size as well as the overall chances of false negatives. The probability of a pool of

samples being infected (i.e. at least out of positive) is


The probability of the pooled testing positive is 111A pool that has infected samples may not necessarily test as positive because the test has a non-zero false negative and false positive rates. Hence is not the same as


Note that we have assumed that the false negative and positive rate for pool of samples is the same as that for a single sample. This can be justified based on the limits of detection for the commonly used RT-PCR protocols. Please refer to Appendix A for details.

Following the flowchart in Fig. 1, we can deduce that , the expected number of tests for a pool of size , is given by a recursive function that terminates when we get to  steps:


Here denotes the subpool size and denotes the step number.

It follows that the number of persons per test, which we call the test efficiency amplification , is given by


Correspondingly, the number of tests needed per 1000 people is


The total probability of showing a false positive at the end of all steps can also be calculated using a recursive formula. To better understand the calculation for this step, it helps to write the probabilities at each step as shown in Fig. 5. The recursive formula for the pooled test false negative can then be written as


Here denotes the subpool size and denotes the step number.

Iii Results

icmr recently published guidelines Prakash et al. (2020) for pool testing and suggested limiting the pool size to 5 to avoid dilution. icmr also suggested a staggered approach to use of pooled testing: (a) for areas with infection rate in the population less than pooled testing should be used, (b) For infection rate between , pooled testing should be used for community and asymptomatic patient testing, and (c) for areas with infection rate > , pooled testing should not be used. We will use these numbers as a guide for demonstrating our method. It should be noted that higher pool sizes, up-to , have been reported in other studies Yelin et al. (2020). These are also in agreement with our calculations regarding limits of detection (See Appendix A).

In Figs. 4, 3 and 2, we show the results for a representative set of parameters. We find that the number of tests per people decreases and the false negatives increases as we make the pool size larger. However, there is an optimum pool size that achieves maximum efficiency (i.e. minimum ).

Figure 2 reveals that for the same pool size, a higher infection rate population requires more tests and will have an overall lower accuracy (higher false negative rate). This is consistent with what we would expect clinically. In Fig. 3, we obtain the effect of false negative rate on stepped pooled testing. Interestingly, a diagnostic test with higher false negative would go through more samples in a fewer number of tests but at the cost of overall higher pool test false negative making this trade-off possibly undesirable.

Figure 2: Effect of population infection rate . Tests required per 1000 people (left) and pool test false negative percentage (right) as a function of pool size. We assume number of steps , a false positive rate of and a false negative rate of .
Figure 3: Effect of false negative rate . Tests required per 1000 people (left) and pool test false negative percentage (right) as a function of pool size. We assume number of steps , an infection rate and a false negative rate .

In Fig. 4, we see the effect of the number of steps, (also the number of samples per patient) on the pooling strategy. Similar to the previous two parameter sweeps, we notice that the test required per people shows a non-monotonic behavior and has an optimal pool size for which the pooling is most efficient (Note that for , minimizes at which is beyond the visible horizontal axis). On the other hand, the false negative rate steadily increases but still remains below the false negative rate of a single test. It is obvious that using multiple samples significantly reduces the number of tests needed without compromising the overall false negative of the pooling strategy.

Figure 4: Effect of number of steps . Tests required per 1000 people (left) and pool test false negative percentage (right) as a function of pool size for and . We assume an infection rate of , a false negative rate and a false positive rate .

Table 2 summarizes the results for a broad set of plausible scenarios to demonstrate the efficiency of this strategy. In addition to predicting the efficiency and accuracy of different pooling strategies, we can also this method to calculate the optimal pool size that leads to the least number of tests (i.e. minimizes ). Figures 4, 3 and 2 clearly demonstrate the existence of such an optimum. In Table 3, we show various possible testing scenarios and the corresponding optimal pool size. The results in this section show that stepped pooled testing can reduce the overall pool false negative rate below the false negative rate of an individual test.

Infection Rate ()
Number of Steps
Pool Size ()
Test Needed Per
1000 People
Pool Test False
Negative %
Testing Cost
Reduction (%)
2 2 535 1.10 46.5
2 3 385 1.63 61.5
2% (Low) 2 5 283 2.67 71.7
3 4 286 1.31 71.4
3 6 205 2.03 79.5
4 8 162 2.57 83.8
2 2 584 2.70 41.6
2 3 456 3.96 54.4
5% (Medium) 2 5 394 6.27 60.6
3 4 343 3.64 65.7
3 6 267 5.76 73.3
4 8 224 7.13 77.6
2 2 663 5.27 33.7
2 3 565 7.52 43.5
10% (High) 2 5 549 11.36 45.1
3 4 445 8.24 55.5
3 6 392 13.02 60.8
4 8 341 16.52 65.9
Table 2: Testing cost reduction from stepped pool strategy. We show the overall testing cost reduction for various plausible scenarios outlined by icmr. We assume a false negative rate and a false positive rate .
Optimal Pool
Size ()
Test Needed Per
1000 People
Pool Test False
Negative %
Testing Cost
Reduction (%)
2 % 8 253 4.14 74.7
Infection Rate () 5 % 6 393 7.35 60.7
10 % 4 544 9.54 45.6
5 % 8 268 1.46 73.2
False negative 15 % 8 253 4.14 74.7
rate () 30 % 9 229 8.48 77.1
40 % 10 211 11.71 78.9
2 8 253 4.14 74.7
Number of steps () 3 18 122 6.96 87.8
4 32 81 12.07 91.9
Table 3: Optimal pool size under various scenarios. Unless specified in the first column, we use number of steps , an infection rate of , a false negative rate and a false positive rate .

Iv Conclusion

We propose a new stepped pooled testing strategy that can significantly reduce the cost of testing a large population. The strategy also reduces the chances of false negative in almost all scenarios because an infected patient’s sample is likely to be tested multiple times. Even in the simplest case with two samples per individual (i.e. two steps, also called Dorfman Pooling Dorfman (1943)) and an initial pool size of , we can significantly reduce the number of tests required per 1000 individuals, by up to for populations with a high infection rate and up to for populations with a low infection rate. As the number of steps and initial pool size is increased, the testing efficiency progressively improves, albeit at the cost of slightly higher false negative rate. Never the less, barring the cases with very high infection rate, the pooled false negative rate is still below that of an individual test.

Based on our results, we make several suggestions about the effective pool size and the number of samples that should be collected from an individual. This methodology should be customized dynamically and regularly based on evolving local levels of infection. Most significant benefits of this strategy can be realized by collecting or samples from each individual and pooling them into groups of to . Increasing the number of steps  means collecting more samples from each patient being tested. Hence, the value of

 should be chosen pragmatically based on consultation with the physician or health professional. Finally, we note that machine learning methods may be implemented to utilize data collected on disease spread and dynamically adapt this strategy for maximum efficiency. We leave this as a topic for future research.

The authors are thankful to Dr. Saumya Srivastava, MBBS and Vertika Srivastava for useful discussions, and to Dr. Hanel for providing more details about his model via email.


Appendix A Diagnostic limit considerations for pooled testing with RT-PCR

One of the key advantages of real-time PCR assays utilizing target sequence specific primers (as is the case with all cov19 test kits) is their wide dynamic range. This enables the analysis of samples with widely varying levels of target RNA. The resolving power of rtpcr is mostly limited by the efficiency of RNA-to-cDNA conversion, a real concern when the target RNA is scarce. Thus, determination of the lod– by performing serial dilutions of the positive control sample and obtaining standard curves– is a critical step in the validation of any testing kit/protocol. The highest dilution of the standard curve, provided in the assay performance evaluation report of any rtpcr assay kit, delineates the lowest concentration that can be quantified with confidence. Thus, pooling patient samples as proposed by the current model is unlikely to influence the probability of a false negative prediction by the assay if the effective target concentration is maintained above the lod. However, if the intensity values recorded are comparable to that of the lod, they should be recorded only as a qualitative (yes/no) prediction Bustin and Nolan (2004).

Target RNA selection plays a big role in the assay sensitivity. These include RNA-dependent RNA polymerase (RdRp), hemagglutinin-esterase (HE), and open reading frames ORF1a and ORF1b. who recommends a first line screening with the E gene assay followed by a confirmatory assay using the RdR p gene. Tang et al. Tang et al. (2020) developed and compared the performance of three novel real-time RT-PCR assays targeting the RdRp/Hel, S, and N genes of SARS-CoV-2. Among them, the COVID-19-RdRp/Hel assay had the lowest limit of detection in vitro and higher sensitivity and specificity.

In this section, we will calculate the maximum possible pool size () that is consistent with the lod of current cov19 tests.

Specimen Type
Mean (range) viral load (RNA copies/mL) in RdRp-P2-negative
but COVID-19-RdRp/Hel-positive specimens
Respiratory tract
    Sputum N/A
Non-respiratory tract
    Urine N/A
    Feces/rectal swabs
Table 4: Viral Load in respiratory and non-respiratory specimens. Reproduced from Chan et al. Chan et al. (2020)

Calculation.– The lod of the COVID-19-RdRp/Hel assay is RNA copies/reaction Tang et al. (2020). Assuming a reaction volume of L, this is equivalent RNA copies in one mL sample. From Table 4, we find that the mean viral load for nasopharyngeal/nasal swabs is RNA copies/mL. Assuming a pool size of samples with only one infected sample, and that samples are pooled first followed by RNA extraction, the net effective viral load in the pooled sample will be copies/mL. In the standardized protocol for RNA extraction and rtpcr procedure, L of pooled sample is diluted with L of solvent and loaded for RNA extraction. Purified RNA is diluted into L of solvent. L of diluted solution is used per well of PCR assay with a total reaction volume of Carter et al. (2020).

Thus, the net effective viral load per PCR well (in units of RNA copies/mL of solvent) is


This quantity should be greater than the minimum lod of the test, which is 448 RNA copies per one mL of solvent. Thus,


Thus, the largest possible pool size consistent with lod of a rtpcr test is . This value is consistent with earlier literature Yelin et al. (2020); Narayanan et al. (2020).

Appendix B Pool Test False Negative

Figure 5: Flowchart showing the calculation for pool test false negative. The representative values are shown on the left side of the tree.