Data-driven Identification of Number of Unreported Cases for COVID-19: Bounds and Limitations
Accurate forecasts for COVID-19 are necessary for better preparedness and resource management. Specifically, deciding the response over months or several months requires accurate long-term forecasts which is particularly challenging as the model errors accumulate with time. A critical factor that can hinder accurate long-term forecasts, is the number of unreported/asymptomatic cases. While there have been early serology tests to estimate this number, more tests need to be conducted for more reliable results. To identify the number of unreported/asymptomatic cases, we take an epidemiology data-driven approach. We show that we can identify lower bounds on this ratio or upper bound on actual cases as a factor of total cases. To do so, we propose an extension of our prior heterogeneous infection rate model, incorporating unreported/asymptomatic cases. We prove that the number of unreported cases can be reliably estimated only from a certain time period of the epidemic data. In doing so, we identify tests that can indicate if the learned ratio is reliable. We propose three approaches to learn this ratio and show their effectiveness on simulated data. We use our approaches to identify the lower bounds on the ratio of reported to actual cases for New York City and several US states. Our results demonstrate that the actual number of cases are unlikely to be more than 25 times in New York, 34 times in Illinois, 33 times in Massachusetts and 17 times in New Jersey, than the reported cases.
READ FULL TEXT