Group testing with nested pools
We iterate Dorfman's pool testing algorithm <cit.> to identify infected individuals in a large population, a classification problem. This is an adaptive scheme with nested pools: pools at a given stage are subsets of the pools at the previous stage. We compute the mean and variance of the number of tests per individual as a function of the pool sizes m=(m_1,...,m_k) in the first k stages; in the (k+1)-th stage all remaining individuals are tested. Denote by D_k(m,p) the mean number of tests per individual, which we will call the cost of the strategy m. The goal is to minimize D_k(m,p) and to find the optimizing values k and m. We show that the cost of the strategy (3^k,...,3) with k≈log_3(1/p) is of order plog(1/p), and differs from the optimal cost by a fraction of this value. To prove this result we bound the difference between the cost of this strategy with the minimal cost when pool sizes take real values. We conjecture that the optimal strategy, depending on the value of p, is indeed of the form (3^k,...,3) or of the form (3^k-14,3^k-1...,3), with a precise description for k. This conjecture is supported by inspection of a family of values of p. Finally, we observe that for these values of p and the best strategy of the form (3^k,...,3), the standard deviation of the number of tests per individual is of the same order as the cost. As an example, when p=0.02, the optimal strategy is k=3, m=(27,9,3). The cost of this strategy is 0.20, that is, the mean number of tests required to screen 100 individuals is 20.
READ FULL TEXT