Exact Inference for Disease Prevalence Based on a Test with Unknown Specificity and Sensitivity
To make informative public policy decisions in battling the ongoing COVID-19 pandemic, it is important to know the disease prevalence in a population. There are two intertwined difficulties in estimating this prevalence based on testing results from a group of subjects. First, the test is prone to measurement error with unknown sensitivity and specificity. Second, the prevalence tends to be low at the initial stage of the pandemic and we may not be able to determine if a positive test result is a false positive due to the imperfect specificity of the test. The statistical inference based on large sample approximation or conventional bootstrap may not be sufficiently reliable and yield confidence intervals that do not cover the true prevalence at the nominal level. In this paper, we have proposed a set of 95 guaranteed and doesn't depend on the sample size in the unweighted setting. For the weighted setting, the proposed inference is equivalent to a class of hybrid bootstrap methods, whose performance is also more robust to the sample size than those based on asymptotic approximations. The methods are used to reanalyze data from a study investigating the antibody prevalence in Santa Clara county, California, which was the motivating example of this research, in addition to several other seroprevalence studies where authors had tried to correct their estimates for test performance. Extensive simulation studies have been conducted to examine the finite-sample performance of the proposed confidence intervals.
READ FULL TEXT