Tracking the Transmission Dynamics of COVID-19 with a Time-Varying Coefficient State-Space Model

by   Joshua P. Keller, et al.

The spread of COVID-19 has been greatly impacted by regulatory policies and behavior patterns that vary across counties, states, and countries. Population-level dynamics of COVID-19 can generally be described using a set of ordinary differential equations, but these deterministic equations are insufficient for modeling the observed case rates, which can vary due to local testing and case reporting policies and non-homogeneous behavior among individuals. To assess the impact of population mobility on the spread of COVID-19, we have developed a novel Bayesian time-varying coefficient state-space model for infectious disease transmission. The foundation of this model is a time-varying coefficient compartment model to recapitulate the dynamics among susceptible, exposed, undetected infectious, detected infectious, undetected removed, detected non-infectious, detected recovered, and detected deceased individuals. The infectiousness and detection parameters are modeled to vary by time, and the infectiousness component in the model incorporates information on multiple sources of population mobility. Along with this compartment model, a multiplicative process model is introduced to allow for deviation from the deterministic dynamics. We apply this model to observed COVID-19 cases and deaths in several US states and Colorado counties. We find that population mobility measures are highly correlated with transmission rates and can explain complicated temporal variation in infectiousness in these regions. Additionally, the inferred connections between mobility and epidemiological parameters, varying across locations, have revealed the heterogeneous effects of different policies on the dynamics of COVID-19.



There are no comments yet.


page 1

page 2

page 3

page 4


Semiparametric Bayesian Inference for the Transmission Dynamics of COVID-19 with a State-Space Model

The outbreak of Coronavirus Disease 2019 (COVID-19) is an ongoing pandem...

Stochastic modeling and estimation of COVID-19 population dynamics

The aim of the paper is to describe a model of the development of the Co...

Time Varying Markov Process with Partially Observed Aggregate Data; An Application to Coronavirus

A major difficulty in the analysis of propagation of the coronavirus is ...

Characterizing the spread of CoViD-19

Since the beginning of the epidemic, daily reports of CoViD-19 cases, ho...

Adaptive preferential sampling in phylodynamics

Longitudinal molecular data of rapidly evolving viruses and pathogens pr...

Semi-supervised Neural Networks solve an inverse problem for modeling Covid-19 spread

Studying the dynamics of COVID-19 is of paramount importance to understa...

Bayesian state space modelling for COVID-19: with Tennessee and New York case studies

We develop a Bayesian inferential framework for the spread of COVID-19 u...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Since its spread at the beginning of 2020, the COVID-19 pandemic has made clear the impact that human mobility and government policies can have on the spread of novel respiratory diseases. Throughout the world, different approaches have been taken to mitigate the spread of COVID-19, including travel bans, mask mandates, quarantine policies, and capacity restrictions. The diversity of actions taken present a unique opportunity and an urgent demand to assess the impact of these approaches quantitatively and systematically.

Statistical approaches offer opportunities to explore the relationship between different policies and actions and disease dynamics while allowing for variation in the outcome from both measured and unmeasured (or even unmeasurable) factors. However, a purely marginal comparison of factors—as might be obtained using a regression-based approach—can miss important facets of the dynamics of disease spread. This is why the foundation for most infectious disease models is a compartmental model for the disease states, which captures these dynamics. The traditional example of this is a compartmental “SIR” model that employs ordinary differential equations to model transition from “susceptible” to “infected” to “recovered” states [1, 2, 3, 4]. However, the traditional SIR model lacks the flexibility to account for the sophisticated and rapidly changing transmission dynamics of COVID-19. To capture the key characteristics of the COVID-19 pandemic, a number of alterations are needed for the traditional SIR model, including changes to the structure of the compartmental model, incorporation of time-varying factors that may influence the dynamics of the disease spread over the course of the pandemic, and introduction of stochasticity that results from limitations in model assumptions as well as noise in the observed data.

First, there are characteristics of COVID-19 spread that require an adaptation in the structure of the compartmental model. A key characteristic of COVID-19 is that many of those infected are at their most infectious before they are diagnosed [5], with viral load typically highest at the onset of symptoms [6]. Thus, there can be a misalignment if a compartmental model assumes people can only be infectious following diagnosis through a positive test. Further, there are many who have milder or asymptomatic cases and are never officially diagnosed as “infected,” but who may still spread the disease. The framework of a dynamic model should address these facets of how the data (from testing) lines up with the principles of infectious spread.

Second, since the emergence of SARS CoV-2, the dynamics of COVID-19 detection and spread have evolved, and the model should allow for an associated evolution in some of its parameters. Public health guidance has changed throughout 2020 and 2021, with changing guidance and regulations on whether to wear masks, travel, quarantine, gather in groups, attend in-person meetings and school, etc., which can change the rate of transmission from those who are infected to those who are still susceptible. Testing has improved, become more easily accessible, and in some cases evolved to include regular testing even without symptoms, all of which can influence the probability that someone infected with the disease is detected. Treatment has improved, including through the adaptation of corticosteriods and remdesivir among patients with severe disease and care techniques like prone positioning, use of high-flow oxygen therapy, and intubation timing

[7, 8]. These strategies may help in reducing mortality rate of patients with severe disease since the start of the pandemic; indeed, there is evidence of a decrease in risk of mortality among those hospitalized for COVID-19 over 2020, both in the US [9] and the UK [10]. New strains have evolved and gained prevalence during the pandemic, with different infectivity and severity [11, 12]. Because of these changing dynamics, parameters of the compartmental model—including rate of detection, probability of infection given a contact with someone infected, and rates of mortality versus recovery among the infected—have changed over the pandemic, suggesting the need for time-varying coefficients for modeling.

Finally, there is stochasticity within the observed data compared to the dynamics of the model. This stochasticity results both from imperfect compliance with the assumptions of the model and from practical constraints in collecting and reporting the data. For example, the standard compartment model assumes that a community’s population is well-mixed, with an equal chance of contact among any pair of members of that population. For COVID-19, as with most infectious diseases, this assumption is overly simplistic and requires allowance for stochasticity, as local spikes in cases were sometimes the result of outbreaks within an institution or organization, including significant outbreaks in prisons, nursing homes, and meat-processing plants [13, 14, 15], suggesting the pandemic dynamics were in part driven by more frequent contacts within population subsets, rather than across a well-mixed community population. Further, the model assumes homogeneity in individual transmissability, while in fact some spread is driven by superspreaders [13, 16, 17, 18]. Also, measurement error is introduced through data collection. Public health officials have made an enormous effort to collect and publish counts of cases and deaths during the pandemic, but understandably there were occasional patterns in the data related to data collection and reporting rather than dynamics of the virus’ spread. For example, Colorado included death counts for deaths that occurred earlier but had not yet been reported on April 24, 2020 [19]. Further, on weekends and holidays, reporting rates can be lower than usual, with reporting higher following the break to incorporate the backlog [20].

Here, we develop a time-varying coefficient state-space model that uses a structure appropriate for COVID-19 and allows for stochasticity and measurement error in data, as well as evolution of some model parameters over time, with the aim of investigating how a specific factor was associated with virus spread. This model has an advantage over regression-based models of factors that may affect COVID case counts over time, since under a compartmental modeling framework we are modeling the process of disease spread, rather than correlating two time series. Further, by incorporating elements that address time-variability in model parameters and stochasticity inherent in the relationship between the available COVID-19 data and the compartmental model, the model addresses limitations in a classic SIR-style model for disease spread. Specifically, we introduce a multiplicative process model and a negative binomial data model to account for fluctuations in the rates of disease detection and transmission and variability in data reporting. Focusing on retrospective estimation and inference, our model offers a framework to explore and draw inference on the effect of different human mobility behavior and related policies on the pandemic parameters in the model. It therefore meets a critical need to understand how policy choices might affect the dynamics of spread.

As mentioned, a factor of particular interest is the influence of local mobility on COVID-19 spread within a community. Recent studies have suggested strong ties between COVID-19 infection rates and human mobility. Initially in China, there was a strong relationship between the number of COVID-19 cases and human mobility [21]. This finding is consistent with the theory of infectious disease spread in highly coupled metapopulations [22, 23]. This relationship weakened after control measures were put into place to restrict the movement of people in and out of Wuhan province. This previous study used real-time mobility data as well as travel history data to explore the relationship to spread of the disease. They concluded that the drastic control measures implemented in China substantially mitigated the spread of COVID-19. Decreased mobility was also shown to have a protective effect against COVID-19 transmission in the USA [24], a result that agreed with other findings that mobility inflow into a county early in the pandemic was associated with increases in early case counts [25]. While the official response in the USA to COVID-19 has been heterogeneous in terms of lock-downs, this work showed that social distancing helps to reduce the spread of the disease, and should remain part of personal and institutional responses to the pandemic until a vaccine is widely adopted and should continue to be considered as a key protective public health policy in future pandemics of respiratory diseases.

While the relationship between mobility and rates of COVID-19 infection has previously been explored [21, 24, 26, 27, 28], analysis of this relationship has largely relied on regression-based comparisons of time series data, rather than through incorporating observed mobility within the dynamics of an epidemiological model. Here, we develop and apply a time-varying coefficient state-space epidemiological model that allows us to explore the relationship between mobility and COVID-19 spread in several US states and Colorado counties. The structure of this model incorporates mobility as a factor that influences the dynamics of the epidemic, while also accounting for variation over time in some model parameters and stochasticity within the observed case and fatality data compared to the model dynamics. This approach allows us to explore how mobility influenced the dynamics of COVID-19 spread in the United States, while the model development provides a structure that can be extended to explore how other factors influence the dynamics of COVID-19 pandemic and spread of other diseases in the future.

This article is organized as follows. In Section 2, we present a deterministic compartmental model for COVID-19 with time-varying coefficients, characterized by a system of differential equations. In Section 3, by extending the compartmental model outlined in Section 2, we develop a state-space model for COVID-19, which better accounts for stochasticity in disease spread. We apply our proposed method to analyze county-level COVID-19 data in the U.S. state of Colorado and state-level data in the U.S. in Section 4. We conclude this article with some remarks in Section 5. Extra results are deferred to the online supporting materials.

2 A Time-Varying Coefficient Compartmental Model for COVID-19 Dynamics

Given its flexibility in structure and natural connection with dynamical systems, the compartmental modeling approach has been extensively employed in epidemiology [2, 3, 23, 29, 30] and pharmacokinetics [31, 32]. This approach has been widely adopted to understand the dynamics of COVID-19 [6, 33, 34]. We first outline a basic compartmental model for recapitulating the transmission dynamics of COVID-19, which is characterized by a system of differential equations. It will serve as the cornerstone for our statistical model in Section 3. We partition the population in a region (which can be a county, state, or country) into the following eight compartments, each representing a specific stage of COVID-19:

  1. Susceptible individuals (): those who have not been infected by the disease and are at risk of becoming infected;

  2. Exposed individuals (): those who have been exposed to the disease but are not yet capable of infecting others;

  3. Undetected infectious individuals (): those who have the disease, are able to infect others, but have not yet been detected as having the disease;

  4. Detected infectious individuals (): those who have the disease, are able to infect others, and have been detected as having the disease;

  5. Individuals removed from without being detected (): those who had the disease but are then removed from the possibility of being infected again or spreading the disease, either through recovery or death, without ever being diagnosed with the disease;

  6. Individuals removed from (): those who had the disease, were diagnosed, but are then removed from the possibility of being infected again or spreading the disease;

  7. Individuals recovered from (): those who have been through the state and are eventually recovered from the disease (in terms of no more symptoms);

  8. Individuals died from (): those who have been through the state and are eventually deceased.

The eight compartments, together with the possible transitions among compartments, are illustrated in Figure 1 and are designed to capture important features of the dynamics of COVID-19. First, it has been well recognized that asymptomatic and pre-symptomatic individuals, or those who are symptomatic but have not yet been diagnosed with the disease, contribute significantly to the spread of COVID-19 [5, 6]. These individuals are modeled through the compartment in the proposed partition. In addition, recent research [6, 35] indicates that the infectiousness of an infected individual declines quickly within a week of symptom onset. This suggests that an infected individual may no longer be capable of infecting others before the complete disappearance of symptoms or death. To reflect this, the proposed model specifically allows the individuals in to first go through the state before recovery or death. Since no data on the recovery or death of the individuals are available, it is unnecessary to include additional compartments for individuals who recovered or died from the state, and neither will this affect the essence of transmission dynamics.

Figure 1: The conceptual dynamics for our eight-compartment time-varying coefficient dynamical model. Square nodes represent observed states and elliptical nodes represent unobserved states. Compartment transitions are labeled with their rates.

We use the notation of a compartment followed by a time index to denote the size of the compartment at a specific time point. For example, denotes the number of susceptible individuals at time . The transmission of COVID-19 can be characterized by the flow of individuals through these compartments over time, given by the following system of differential equations with time-varying coefficients:


Here, denotes the population of active individuals. A simplification in model (1)-(8) is that we ignore births and deaths, an approximation that is appropriate for a fast-spreading pandemic like COVID-19. For simplicity, we also do not consider immigration and emigration. Given the initial condition (i.e., the initial sizes of the compartments) and parameter values, the trajectory of the epidemic process can be deterministically obtained by solving the system of differential equations. In Sections 3.1 and 3.2, we will discuss how to allow for additional flexibility and stochasticity in the time-varying coefficient compartmental model (1)-(8), and how to link the observed data on COVID-19 cases and deaths to the epidemic process. The initial conditions of the model are not trivial and will be discussed in Section 3.3.

The time-varying coefficients in (1)-(8) bring extra flexibility to the compartmental model to capture the complex and rapidly-varying dynamics of COVID-19. Specifically, models the disease transmission rate between the undetected infectious and susceptible individuals at time . The rate can be understood as the number of effective contacts (i.e., contacts that are sufficient for disease transmission) made by an average member of the undetected infectious individuals per unit time. The probability of each of these contacts being with a susceptible individual is . Therefore, undetected infectious individuals lead to a rate of new infections . For the detected infectious individuals, the rate of effective contacts is reduced by a factor of , since those diagnosed with COVID-19 are likely to have reduced contact with others due to potential (self-)quarantine or hospitalization, leading to a rate of new infections . To incorporate additional covariate information such as mobility and changes in policy, a detailed model of will be provided in Section 3.4.

After being infected, a susceptible individual first goes through an “exposed” state, meaning that the individual has been exposed to the disease but is not immediately able to infect others. The exposed individuals enter the undetected infectious state at a rate of , and naturally represents the latent period, which is the time interval between when an individual is exposed to the disease and when the individual becomes capable of infecting other susceptible individuals. Then, the undetected infectious individuals are either diagnosed with the disease at a detection rate of or removed from the infectious state at a rate of without ever being diagnosed. Similar to , the detection rate is expected to vary along with the development of COVID-19 pandemic. Estimation of plays an indispensable role in understanding the epidemiological mechanics of COVID-19. In our model, we further assume that an infectious individual always needs to go through the “undetected” state before the individual is detected as having the disease. It is possible that an exposed individual can be directly diagnosed with the disease through contact tracing, but we treat this as a special case that the individual spends zero time at the undetected infectious state. Infectious individuals that have been detected become non-infectious at rate . From this state, individuals are finally removed either through recovery at a rate of or due to decease at a rate of . Naturally, deceased individuals do not contribute to the dynamics of disease transmission. We also assume that recovery from COVID-19 confers immunity to reinfection, and thus the recovered individuals can neither spread the disease nor be infected again. This is a simplifying assumption motivated by the rate of reinfection being so low that it does not meaningfully alter the transmission dynamics. In Section 3.5 we discuss the modeling and interpretation of each of these parameters in greater detail.

For epidemiological models, the reproductive ratio is a fundamental quantity to track the pandemic and it serves as a threshold that predicts the spread of an infection [2, 3]. For models that are more sophisticated than the simple SIR model, the reproductive ratio is usually computed based on the equilibrium reproduction number, which characterizes the persistence of the pandemic. For the proposed model (1)-(8), the equilibrium reproduction number is


which is derived using the endemic equilibrium argument [36] in Section A in the supplementary file. If after time , the number of infectious individuals will decrease after time and lead to the disease-free equilibrium. Therefore, an indicates containment of the disease at time . On the contrary, if after time , the pandemic will persist with a nontrivial equilibrium [36] and the larger the equilibrium reproduction number, the larger the population of infectious individuals at the equilibrium will be. Due to its important role in characterizing disease spread, inference on is one of our major interests. In addition, differentiating against , we observe that is monotonically decreasing in if and only if . Assume that the removal rates of the undetected and detected infectious individuals are similar. If , the condition might not hold, meaning that more efficient detection would lead to more disease transmission. This is inconsistent with the known facts about the COVID-19 pandemic. By restricting and modeling the relationship between and (accomplished via prior specification, see Section 3.5), it is guaranteed that . As , the difficulty of detection increases and monotonically converges to . That is, the pandemic dynamics are dominated by the undetected infectious individuals. On the other hand, as increases, the deviation between and shrinks. That is, the pandemic dynamics will be governed more by the detected infectious individuals.

3 State-Space Model for COVID-19 with Time-Varying Coefficients

While the compartmental model in Section 2 describes the underlying deterministic and smooth disease dynamics, there is stochasticity in the actual disease spread. This is due to factors such as non-uniform mixing of the population and heterogeneity in day-to-day activity patterns. Additionally, data on COVID-19 cases and deaths are always observed with substantial statistical noise, such as reporting error and reporting delay, which cannot be easily captured by the differential equations. To mitigate these issues, we cast the compartmental model in Section 2 in a state-space modeling framework. Following an approach similar to Dukic et al.[37] and Osthus et al.[38], we construct a process model which allows the epidemic process to stochastically deviate from the solution given by the compartmental model (1)-(8). Moreover, we build a data model for the observed data, which further takes into account the measurement error of COVID-19 cases and death counts.

3.1 Process Model

Since COVID-19 data are reported daily, it is natural to consider a discretized version of the compartmental model in Section 2 with a time step of 1 day. Denote the true but unobservable populations of the eight compartments on day . Define as the solution to differential equations (1)-(8), starting from the state of the prior day, .

Denote by the unobservable number of individuals diagnosed with COVID-19 on day . Similarly, let denote the unobservable number of individuals who died from COVID-19 on day . In terms of states and , represents the number of individuals who moved from to , and represents the number of individuals who moved from to . As discussed before, it is unrealistic to assume that the true yet unobservable numbers of the diagnosed and reported deceased individuals perfectly agree with the solution to the compartmental model, especially for the complicated pandemics such as COVID-19. To introduce additional flexibility, motivated by Davis and Wu[39], we model and via multiplicative processes


where and refer to the numbers of new detections and deaths on day , according to the solution to our compartmental model. For simplicity, we model the multiplicative processes and

as random variables independent in


so that , , , and . The unit mean assumption, which mimics the zero mean assumption for the additive models, leads to and . Compared to the traditional additive models, the multiplicative process model (10) is more natural to introduce deviations between the nonnegative solution to differential equation models and unobservable process with nonnegative observable space. In summary, the underlying epidemic process is centered at the solution to differential equations (1)-(8) but is also allowed to be stochastically different from the deterministic process specified by the compartmental model.

The solution to the differential equations (1)-(8) is not available in closed form. A numerical approximation method, such as the Runge-Kutta solver, must be employed. Here we use a simple and computationally efficient first-order difference calculation, which does not compromise much the model performance. From (5), a first-order difference update to the states means that . Thus, , and . We assume that in (10) only affects the transition from state to state . Thus, we have

In essence, the error process therefore offers additional flexibility in the detection rate from the compartmental model. Accordingly, the empirical calculation of will be adjusted by replacing in (9) with . The error process plays a similar role as in the disease transmission, and we assume that they only affect the transition from to . Specifically,

As both and individuals are unable to spread the disease, does not influence the equilibrium reproduction number.

Lastly, based on our assumptions, the error terms do not have an impact on the states , , , and , and these states are consistent with the corresponding solutions to the compartmental model , , , and , calculated using the first-order approximation.

3.2 Data Model

Let be the number of newly reported positive cases on day , and denote the number of newly reported deaths on day , where . In the event that only cumulative numbers of cases and deaths are available, the values of and can be computed using a first order difference. We assume negative binomial models for the daily reported case and death counts,

where and , respectively. Parameter

allows for overdispersion, relative to a Poisson distribution, and grants more flexibility to the model for fitting the COVID-19 data

[40]. We place a Gamma prior on , , while is treated similarly.

3.3 Model for the Initial Condition

The initial condition of the infection dynamics refers to the initial population sizes of the compartments. The calendar time corresponding to can be chosen for each modeling context. We assume that at , the total number of removed individuals is 0, i.e., . The number of detected infectious individuals at time , , is observed and assumed to be non-zero. In the analyses of Section 4, we choose to be a time early in the pandemic when there are multiple detected cases.

The numbers of susceptible, exposed, and undetected infectious individuals at time are not observed and need to be estimated. We assume the initial population size of the exposed and undetected infectious individuals is times the initial size of the detected infectious individuals,

We place a gamma distribution prior on

, , with a prior mean of 5. This choice is based on the findings by Li et al.[34] that 86% of all infections were undocumented at the beginning of the epidemic in China.

Next, we place a uniform distribution prior on the proportion of exposed individuals among

. That is, we assume Lastly, we have , where is the (known) population size.

3.4 Model for the Time-Varying Disease Transmission Rate with Covariates

We now turn to the modeling of the epidemiological parameters. We start with the model construction for the time-varying disease transmission rate between the undetected infectious and susceptible individuals, , which is an important parameter that characterizes the speed of disease transmission. We model the transmission rate as time-varying to account for changes in COVID infection rates due to human behavior [24, 26], enactment of government policies [21, 41], evolution of new viral strains, and other time-sensitive factors. Since , it is easier to work with the log transformation of . A simple and flexible way of modeling is through temporal splines,


where is the regression intercept term,

is a vector of basis functions evaluated at time

(excluding the intercept), and is the vector of regression coefficients. We primarily consider a piecewise-linear spline for

with knots selected by quantiles, but other forms (piecewise constant, natural cubic splines, etc.) are possible in this framework. The intercept term and regression coefficients are further modeled through normal priors,

and with . Here,

denotes a normal distribution with mean 0.5 and standard deviation 0.1 restricted to


The framework in (11) makes it straightforward to incorporate additional covariates for modeling . Let denote covariates. Motivated by the hybrid of single index models and additive models[42], we further generalize (11) as


In Section 4, we will include the information on human mobility via the terms .

The transmission rate between the detected infectious and susceptible individuals is assumed to be reduced by a factor of due to potential quarantine and hospitalization. We place a uniform distribution prior on , .

3.5 Model for the Other Epidemiological Parameters

In this section, we will detail the model and prior specification for the remaining epidemiological parameters. Most parameters in model (1)-(8) have practical implications corresponding to clinical characteristics of COVID-19. Therefore, we elicit informative priors for these parameters by summarizing the findings in the literature [6, 33, 34, 43, 44, 45, 46, 47, 48]. We acknowledge that there is not yet a consensus on all clinical characteristics of COVID-19. For example, Li et al [47] estimated a mean incubation period of days, while the same quantity was estimated to be and days in Jiang et al [46] and Deng et al [43], respectively. As a result, our priors are chosen in consistent with the majority of the literature. Our general methodology works for any choices of priors. We also note that the timing of COVID-19 events is highly variable at the individual level, but the parameters in (1)-(8) are defined at the population level, which are less variable. The informative priors improve the interpretability of the results by assigning larger prior mass to the parameters around clinically meaningful values. In addition, since our model is an eight-compartment model with only two compartments observed, some model parameters are nonidentifiable. Such nonidentifiablility is mitigated through the use of informative priors [49, 50].

In the following paragraphs, we detail the interpretation and modeling choices for the remaining parameters, which are summarized in Tables 1 and 2.

Parameter Interpretation Prior
Initial number of exposed individuals
Reduction in transmission rate for detected infectious individuals
Overdispersion parameter for daily cases
Overdispersion parameter for daily deaths
Intercept for disease transmission rate (log scale)
, Regression coefficients for disease transmission rate (log scale)
Standard deviation of regression coefficients for disease transmission rate
Stabilized detection fraction
is the initial detection fraction
Speed of detection fraction increase

Intercept for death fraction (logit scale)

Regression coefficients for death fraction (logit scale)
Standard deviation of regression coefficients for death fraction
Table 1: Model parameters, interpretations, and prior distributions.
Param. Interpretation Prior
Multiplicative factor of initial exposed and undetected individuals relative to detected
Inverse of latent period
Inverse of infectious period before detection (for detected individuals)
Inverse of infectious period after detection (for detected individuals)
Inverse of whole infectious period (for undetected individuals)
Inverse of time to recovery after the end of infectious period (for detected individuals)
Inverse of time to death after the end of infectious period (for detected individuals)
Table 2: Model parameters with informative priors elicited based on clinical characteristics of COVID-19.
Latent period.

The parameter denotes the rate of exposed individuals becoming infectious, and represents the latent/pre-infectious period, i.e., the time period between exposure to the disease and being able to infect others. Note that the latent period in our paper is different from the incubation period, where the latter refers to the time period between exposure and symptom onset (see Figure 2). For COVID-19, it is well recognized that patients usually become infectious before the onset of symptoms [6]

. We place a beta distribution prior on

, , with a prior mean of and standard deviation of .

Figure 2: Natural history of COVID-19, definitions of terms, and corresponding parameters.
Detection rate.

The parameter represents the rate of undetected infectious individuals being detected as having the disease at time . We model by where represents the proportion of undetected infectious individuals on day that are eventually diagnosed with the disease, and is a parameter such that represents the time interval between when an individual is capable of infecting others and when the individual is detected as having the disease. In practice, detection may occur after the onset of symptoms (e.g., a person feels sick a day or two and then goes to get a test) or earlier than the start of symptoms (e.g., a group like a college athletic organization getting mandatory regular tests that identify pre-symptomatic or asymptomatic cases). Without additional information, we assume that the average time to detection for undetected infectious individuals is similar to the time to symptoms onset. In other words, on average, undetected infectious individuals (who are eventually diagnosed) are diagnosed with the disease when their symptoms start to appear. Therefore, is roughly equal to the incubation period. We place a beta distribution prior on , , with a prior mean of and standard deviation of . Next, we model with a three-parameter curve,


where , and . As a result, starts from at time , monotonically increases with increasing , and converges to as . The underlying assumption is that at the beginning of the pandemic, due to the limited testing capacity, the proportion of detection starts from a low level . As time progresses, testing capacity increases monotonically, and eventually, anyone who wants a test can get it, making the proportion of detection stabilize at . The parameter characterizes the speed of testing capacity increase. We assume with a prior mean of and standard deviation of . Further, we model the ratio between and and place a uniform distribution prior on it, . Lastly, we assume .

Infectious period.

The parameter denotes the rate of detected infectious individuals becoming noninfectious, and represents the time interval between disease detection and when an individual is no longer able to infect others. We place a beta distribution prior on , , with a prior mean of and standard deviation of .

The parameter denotes the rate of undetected infectious individuals becoming noninfectious without ever being detected as having the disease. We have Here, is the proportion of undetected infectious individuals on day that are never diagnosed with the disease, and is the time interval during which a never-detected infectious individual is capable of infecting others, i.e., the whole infectious period. By the definitions of and , we have In other words, the whole infectious period equals to the time interval between when an individual is capable of infecting others and when the individual is diagnosed with the disease, plus the time interval between diagnosis and when the individual is no longer able to infect others. Based on such a relationship, it is guaranteed that , ensuring that is a monotonically decreasing function of (see Equation (9)).

Recovery and death rates.

The parameters and are the rates of recovery and death for detected but no-longer-infectious individuals at time , respectively. We have


where represents the fraction of deaths among detected but no-longer-infectious individuals at time , and accordingly, represents the corresponding fraction of recoveries. By constructing and as a function of the time-invariant and and the time-varying in (15), we can easily estimate a time-varying recovery rate and time-varying death rate without introducing additional identifiability concerns.

We place beta distribution priors on and . Specifically, with a prior mean of and standard deviation of , and with a prior mean of and standard deviation of . The fraction of deaths , and therefore, we consider the logit transformation of , . We model the transformed with a B-spline,


Here, is the intercept term, is a vector of basis functions evaluated at time (excluding the intercept), and is the vector of regression coefficients. The intercept term and regression coefficients are given normal distribution priors, and with .

3.6 Implementation

We use the Stan modeling framework to sample from the posterior distribution of all parameters [51]. Stan uses a Hamiltonian Monte Carlo procedure to efficiently sample from the posterior distribution. A major advantage to using the Stan framework is that it can easily accommodate non-conjugate and truncated prior distributions, which allows us for more flexibility in choosing a prior distribution that is scientifically relevant for each parameter.

4 Analysis of the COVID-19 Data

4.1 Data Collection

We apply this model to county-level data in the U.S. state of Colorado (Section 4.2) and state-level data in the U.S. (Section 4.3) in the regions highlighted in Figure 3. Daily counts of cases and deaths were obtained from The New York Times[52].

Figure 3: Study locations (in blue) for analysis of county-level data in Colorado (A) and state-level analysis in the United States (B).

We collected mobility data from SafeGraph [53] via the covidcast R package [54], which uses anonymized location data from mobile phones to generate different views of mobility over time. We use three mobility metrics from this data source — the fraction of mobile devices that did not leave the immediate area of their home in each day (“completely home”), the fraction of mobile devices that spent more than 6 hours at a location other than their home during the daytime (“full-time work”), and the number of daily visits made by those with SafeGraph’s apps to restaurants in a county. Each of these metrics are available daily, and we applied a kernel smoother to obtained smoothed values of each metric. An example of these mobility data for Colorado is shown in Figure 4, along with the dates of some key statewide policies related to COVID-19 control. The mobility data for all regions is shown in Figures S1-S3 and S15-S17 in the supplementary files.

Figure 4: Example of mobility data used in analysis. Measurements from SafeGraph were used to characterize mobility patterns, including the fraction of mobile devices that were completely at home each day, the fraction that spent 6 hours or more away from home (“full-time work”), and the number of daily visits to restaurants. This figure shows these data for the state of Colorado. The gray curves represent the daily values and the red curve represents the smoothed values used for modeling. Dates of several key statewide policies, as well as weekly new cases of COVID-19 in the state, are also shown.

4.2 Models for Counties in Colorado

4.2.1 Model Setup

We first apply this model to data from counties in Colorado. The first case of COVID-19 in Colorado was reported on March 5, 2020. On March 10, 2020, the Colorado Governor issued a Declaration of Disaster Emergency and on March 27, 2020, a statewide “Stay at Home” order was given (Figure 4). Although some of the early outbreaks in Colorado were located in resort areas that have a small resident population, COVID-19 quickly spread throughout the state. On April 27, 2020, the state transitioned to a “Safer at Home” order that was later amended to county-specific regulations related to measures of local pandemic risk.

For modeling, we selected the ten largest counties (by population) in Colorado (Figure 3 and Table S1 in the supplementary files). This excludes some of the counties with ski resort communities where the earliest outbreaks occurred, but represents the vast majority of the state’s population and communities varying from urban to rural. For each county, we select as the first day with 6 or more detected cases in that county and use each day as the model time-step. The number of initial detected cases () was taken as the total number of reported cases in the five days before the modeling start date. We model the dynamics through January 31, 2021, at which point the widespread introduction of vaccines fundamentally changed the underlying dynamics of disease spread.

All models are fit with the priors specified in Tables 1, and we set . In practice, we found that the prior distributions in Table 2 were not always informative enough for regions with a large number of cases and may lead to clinically implausible parameter estimates. Therefore, we scaled the priors in Table 2 based on the cumulative number of detected cases. We scale the prior standard deviations of , , , , and based on the ratio of the total number of cases of the county and that of Pueblo county. If the ratio is , the prior standard deviation is scaled to of the default prior standard deviation.


, we used linear B-splines with 10 degrees of freedom (df) as the basis functions

. In addition, we included the three smoothed mobility measures in the regression component as in (12). For , we used linear B-splines with 5 degrees of freedom. In model fitting for all counties, we used 4 chains each with 1,000 warmup iterations and 1,500 post-warmup iterations. We set the maximum tree depth to 14 and adapt_delta to 0.95.

4.2.2 County Model Results

The time-varying for each Colorado county is displayed in Figure 5. There is a similar tri-modal trend across all counties: peaks in in late March (at the start of modeling), July, and November. These trends match the rise and fall of new cases, shown in Figure 6, with the largest peak in cases happening in November and December in all counties. The dynamics of some counties are slightly different, such as additional peaks in in early Fall 2020.

Figure 5:

Posterior mean (solid line) and 95% credible interval (dashed line) of

for Colorado county models. Horizontal line represents .
Figure 6:

New cases by date in Colorado counties. Points show observed counts, line shows average model predictions (i.e. mean of the posterior predictive distribution).

The impact of the stay-at-home orders is clearly evident in the decline in throughout April and the resulting downturn of new cases in late April and early May. The equilibrium reproductive number remained near 1 until a small rise in early July. In early September, the began to increase in the state overall, with a particularly large rise in Boulder County, where there were several high-profile outbreaks among college-aged individuals (see corresponding panel in Figure 6). As the number of new daily cases decreased in Boulder County, the estimated declined again.

The predicted and observed number of daily deaths is provided in Figure S4 in the supplementary files. On most days, no deaths were reported, but the predicted number of new deaths clearly shows the April and November peaks in mortality.

Posterior means for other parameters are shown in Figure S5 to S12 in the supplementary files. The proportion of detected infected individuals for each day is shown in Figure S13. In all counties, this proportion starts small but increases through April and May and remains elevated throughout the rest of the pandemic, reflecting increased testing capacity.

4.2.3 Impact of Mobility on Within-County Transmission

County-wide mobility played an important role in modeling the transmission rate () in most counties. The effects of mobility on transmission are summarized in four ways. Figure 7 shows the posterior means (and 95% credible intervals) for the coefficients for each mobility term. Figure 8 shows the correlations between the mobility data time series and the time series of the posterior mean of , i.e. . Figure 9 shows the posterior correlation between the mobility coefficients () and three of the time-constant epidemiological parameters, i.e. and similarly for and . Figure S14 in the supplementary files shows the combined time-varying effect of mobility on infectiousness (i.e., ).

In all counties, the coefficient for the proportion of people working full time is positive, indicating a positive association between this measure of mobility and COVID spread, and in all except Larimer and Pueblo counties the credible interval excludes zero. This clearly demonstrates how changing patterns in work trends (data displayed in Figure S2) are associated with transmission of COVID-19. This reflects not only the sharp reduction in the proportion of people working full time after the stay-at-home orders at the start of the pandemic in March 2020, but also the gradual increase in mobility due to more people working away from home after the April 27 “Safer-at-Home” order that relaxed some restrictions through the summer and fall (Figures 4 and S2). The counties with the largest coefficients are all in the metropolitan areas surrounding the economic centers of Denver, Boulder, and Colorado Springs. Unsurprisingly, the large positive values of the mobility coefficient lead to strong positive correlations between the proportion of people working full-time and the posterior mean transmission rate (Figure 8). In addition, for most counties, we observe strong positive posterior correlations between the coefficient for full time work and while the posterior correlations between the coefficient for full time work and and tend to be negative (Figure 9).

Figure 7: Impact of mobility measures in transmission parameter . Points and horizontal error bars represent posterior means and 95% credible intervals of the mobility coefficients, respectively.
Figure 8: Correlation of mobility data time series with the posterior mean of transmission parameter .
Figure 9: Correlation of mobility data coefficients and three epidemiological parameters: (inverse of latent period), (reduction in transmission rate among detected infectious individuals), and (inverse of infectious period before detection).

The direct effect of visiting restaurants on the transmission rate was weaker in most counties, as evidence by the smaller (in magnitude) posterior means compared to the full-time work coefficient (Figure 7). In all counties the posterior mean of the coefficient for restaurant mobility was negative. This is also reflected in the correlation of the restaurant coefficient with the epidemiological parameters (Figure 9), which show the opposite pattern of the full time work coefficient. However, this must be viewed in light of concurrent adjustment for the proportion of individuals working from home (which is positively correlated with restaurant visits). When we examine the marginal correlation of restaurant mobility with transmission (Figure 8), we see that the relationships range from weakly positive to strongly positive. This indicates that restaurant mobility and full-time work together are related to increases in transmission, although each mobility measure alone cannot fully explain the temporal trend in transmission.

When concurrently adjusting for full-time work and restaurant visits, we did not see a meaningful direct impact of the proportion of individuals staying at home (credible intervals in Figure 7 all include zero). However, it is important to note that this does not mean that stay-at-home orders were ineffective, but rather the mobility measures of people outside the home (at work and restaurants) were more predictive of transmission than the numbers of people staying home. The negative marginal correlation between the proportion of people at home and the posterior mean transmission rate (Figure 8) reflects this, indicating that more people staying home was correlated with lower disease transmission.

4.3 Models for U.S. States

4.3.1 Data and Model Setup

We also fit the model to nine U.S. states that are representative for their different trends in the number of cases throughout the pandemic. The selected states were (Figure 3): Colorado, South Dakota, and Wisconsin, which had small early waves and a large late wave; California, Florida, and Texas, which had large numbers of cases mid-summer and a late wave; Iowa, which had moderate numbers followed by a late wave; Georgia, which had relatively small summer and fall peaks; and New York, which had a very large early peak in the spring and a late wave in November.

For these models, we chose the initial time point to be the first day after the state-level case count exceeded 100 cases (Table S2 in the supplementary files) and modeled dynamics through January 31, 2021. Prior specification and model structure was the same as in the county-level models described in Section 4.2.1. For the state-level models, we included the same three measures of mobility, but calculated at the intra-state level (Figures S15 to S17), in the term. For the clinical parameters, the prior standard deviations were scaled against the number of cases in South Dakota.

4.3.2 State Model Results

The value of for each state is displayed in Figure 10 and the observed and predicted numbers of new cases for each state are plotted in Figure S18 in the supplementary files. There is a similar tri-modal trend across all states: peaks in in April, July, and November–December. New York began with a very large , that precipitously declined as the epidemic in New York City was brought under control in April 2020. After the initial spread of COVID-19 spurred a round of lockdown measures, most states had values around 1 before later local peaks in lead to additional waves of COVID-19. California, Florida, and Texas all had increases in in late June that preceded large numbers of cases in July (Figure S18 in the supplementary files). Meanwhile, South Dakota had an above 1 from July through October, which led to continuous rate of growth in new cases during that time (Figure S18). Consistent with the county-level models, the full-state Colorado model had its largest peak in November.

Figure 10: Posterior mean (solid line) and 95% credible interval (dashed line) of in state models. Horizontal line represents .

4.3.3 Impact of Within-State Mobility

For some states, including the overall Colorado model, there is a strong relationship between one or more of the mobility factors and the transmission parameter (Figure 7). The models for California, Colorado, and New York all have coefficients for the full-time work mobility that are positive, indicating that the transmission rates are positively associated with the proportion of individuals being away from home more than six hours per day. The large coefficient for New York reflects the link between the drop in mobility in March and April 2020 (see Figure S28 in the supplementary files) and the drop in cases after an initial large peak during that time. This impact is further reflected in the correlation between the proportion of people working from home and the transmission term (Figure 8). Similar to the Colorado county models, the coefficient for full-time mobility in these three states was highly positively correlated with (Figure 9). The impact of people working away from home full time was limited in the other state models, as evidenced by the credible intervals including zero (Figure 7) and the weak correlations with transmission (Figure 8). It is possible that the mobility data provide a better reflection of potential for contacts in some states compared to others. In a state with heavy tourist traffic, for example, the mobility data for working full-time or at home would not capture tourist patterns, although the restaurant visits would to some degree. This may partially explain why a weaker association was observed in states like Florida and South Dakota.

The impact of restaurant mobility was heterogeneous across states. In California, Colorado, and New York, the coefficient was not meaningfully different from zero (Figure 7) but restaurant mobility was positively correlated with transmission (Figure 8). The models for Florida, Georgia, and Texas yielded negative posterior means for the coefficients of restaurant mobility. However, this must be viewed in conjunction with the negative posterior means for the coefficient of the proportion of individuals completely home (for which a negative value means less disease spread when more people are staying home). The net result was a minimal effect of mobility on disease spread in those particular states (Figure S28).

Overall, the correlations between the time-varying transmission rate and the proportion of people completely home were stronger among states than in the Colorado county models. This was particularly true in California and New York, where the increase in the proportion of individuals working from home was particularly large (Figure S15). However, this correlation was largely captured by the spline terms in , and so the coefficients for the proportion of people completely at home were not meaningfully different from zero (Figure 7) despite a negative correlation between this measure of mobility and transmission (Figure 8). This reflects a challenge of the flexibility of our modeling approach: having both temporal splines and time-varying predictors means that variation can be explained in multiple ways. In the case of the California and New York models, the correlation between transmission and proportion of people completely at home is likely driven by the dramatic effects of stay-at-home orders at the beginning of the pandemic, while the later surges in transmission (Figure 10) do not correlate as well with the proportion of people completely at home. This leads to the temporal variability being attributed to the temporal splines, rather than the mobility time series.

In Iowa and Texas, the correlation between transmission and mobility measures was opposite from what was seen in the majority of models. In these two states, more people working from home was positively correlated with transmission and more people visiting restaurants was negatively correlated with transmission (Figure 8). These surprising correlations may be driven in part by the reduction in mobility around the time of the December transmission waves (compare Figure 10 and Figure S16 in the supplementary files). Or it may be that differences in policies (such as mask-wearing) mean that mobility alone is not representative of transmission risk in these states.

The correlation between mobility data coefficients and and were weaker in the state models that in the Colorado county models (Figure 9). At the state level, the larger amount of case data means that and can be estimated with more precision (compare Figures S7 and S21 in the supplementary files), reducing the connection between mobility and these clinical parameters. However, the stronger correlation between mobility coefficients and , which directly affects transmission, remains.

5 Conclusions and Discussions

We have presented a flexible model for a time-varying infectious disease that includes: temporal variation in key model parameters through the use of splines and a regression term, a multiplicative state-space process model for allowing for day-to-day heterogeneity in disease spread, and overdispersion in the observed number of cases. Together, these elements allow our model to capture important features of data observed in a pandemic.

The inclusion of the mobility term in the infectiousness parameter was of particular interest to us. We observed that it played a role in the transmission in most counties and states that we modeled (Figure 7). However, it did not play a role in the modeled transmission in all regions. The heterogeneous impact of mobility data may reflect the importance of other factors, such as compliance with mask-wearing mandates. While derived from individual-level information on movement, the mobility data we were able to incorporate into the models was averaged at the county and state levels. This relatively coarse spatial scale means that we cannot differentially model the connection between disease transmission and mobility in at-risk or highly-active subpopulations. Individual-level information on movement could improve the strength of the evidence between mobility and COVID transmission. Furthermore, a potential downside to using cell phone movement as a proxy for human mobility is that there are disparities in cell phone ownership and usage among groups of people. In particular, older Americans are less likely to own a cell phone, and thus may be left out of the mobility discussion. Similarly, cell phone use may also be affected by socioeconomic status and not capture institutionalized populations well. Despite these limitations of the mobility data, we saw a clear effect of reduced mobility in the sharp reduction in cases early in the pandemic (Figure 5). Furthermore, higher rates of working outside the home were associated with increased transmission in many counties and states.

The multiplicative process model introduces extra flexibility to the compartmental model, which better reflects fluctuations in the rates of detection and transmission. In addition to non-random mixing of the population, a key violation of the traditional SIR model assumptions is that each individual is equally infectious. There is now considerable evidence [13, 16, 17, 18] that superspreaders play an important role in the spread of COVID-19, and the flexibility afforded by the process model can incorporate this heterogeneity in spread by allowing for a multiplicative shift to the solution to differential equations of the compartmental model at each time point. This has the additional advantage of allowing the process to explicitly account for heterogeneity in case reporting, such as the tendency of many states to report fewer cases on weekends.

Our approach also presents advantages over approaches that only compare time series of cases and other factors to identify marginal relationships [21, 27, 28]. By incorporating information on mobility into a model for disease spread, our model can more accurately represent a potentially causal role of factors in disease spread. This further allows for comparison of mobility information with the latent rate of disease transmission, rather than just the number of reported cases.

One general challenge in the fitting and interpretation of the compartment model is the limited data for the number of individuals at each stage of the dynamics. For our model, although the framework presented in Section 2 contains eight compartments, we only observe data from two of those compartments ( and ). Several parameters influence the number of new cases , which can lead to some identifiability issues. For example, a moderate number of identified cases could occur due to a low detection (small ) among a large number of undetected infectious individuals (large , arising from large ), or it could be due to high detection (large ) among a small number of infectious individuals (smaller , arising from smaller ). To mitigate this, we introduced a non-decreasing structure for . Nonetheless, it remains nontrivial to accurately estimate the true number of undetected cases at any given time and the results of Figure S13 should be interpreted cautiously. Inclusion of information on testing rates could be used to partially address this by providing a basis for , but quality and quantity of information on testing rates and strategies vary widely between and within jurisdictions, which deserves more modeling efforts in the future.

Several extensions are possible to the models we presented here. The additive model framework for the time-varying parameters could be expanded to include environmental factors such as temperature, humidity, and air pollution. There has been suggestive evidence [55, 56, 57, 58, 59, 60, 61, 62] that these factors may influence spread and severity of symptoms. As noted above, more detailed mobility information could be incorporated to represent small-scale movement patterns. Although we used the same prior distributions and model settings across all counties and states, a natural extension of this approach would be to fit a joint model that includes all regions (counties or states) together. This would allow for information on the non-spatial parameters (e.g. and ) to be estimated using shared information. Direct movement of susceptible and infectious cases between regions could then be included as well. The primary drawback to such an approach is the computational complexity that arises from having separate spline parameters in each region.

Overall, the proposed model provides a rigorous, process-driven framework for modeling the impact of time-varying factors on infectious disease spread. Our analyses showed an important role of mobility in the spread of COVID-19 in several Colorado counties and U.S. states.

6 Data Accessibility

The data and R code used to fit these models is available at


  • [1] Anderson R.M and May R.M. Infectious Diseases of Humans: Dynamics and Control. Oxford: Oxford University Press. 1991.
  • [2] Brauer, F., Castillo-Chavez, C. and Feng, Z. Mathematical Models in Epidemiology. New York: Springer New York. 2019.
  • [3] Diekmann, O. and Heesterbeek, J. Mathematical Epidemiology of Infectious Diseases. Wiley. 2000.
  • [4] Hethcote H.W. The mathematics of infectious diseases. SIAM Review 2000; 42(4): 599-653.
  • [5] Rothe, C., Schunk, M., Sothmann, P., Bretzel, G., Froeschl, G., Wallrauch, C., Zimmer, T., Thiel, V., Janke, C., Guggemos, W. and Seilmaier, M. Transmission of 2019-nCoV infection from an asymptomatic contact in Germany. New England Journal of Medicine 2020; 382(10): 970-971.
  • [6] He, X., Lau, E.H., Wu, P., Deng, X., Wang, J., Hao, X., Lau, Y.C., Wong, J.Y., Guan, Y., Tan, X. and Mo, X. Temporal dynamics in viral shedding and transmissibility of COVID-19. Nature Medicine 2020; 26(5): 672-675.
  • [7] Bos, L.D., Brodie, D. and Calfee, C.S. Severe COVID-19 infections-knowledge gained and remaining questions. JAMA Internal Medicine 2021; 181(1): 9-11.
  • [8] Boudourakis, L. and Uppal, A. Decreased COVID-19 Mortality-A Cause for Optimism. JAMA Internal Medicine 2021; 181(4): 478-479.
  • [9] Asch, D.A., Sheils, N.E., Islam, M.N., Chen, Y., Werner, R.M., Buresh, J. and Doshi, J.A. Variation in US hospital mortality rates for patients admitted with COVID-19 during the first 6 months of the pandemic. JAMA Internal Medicine 2021; 181(4): 471-478.
  • [10] Dennis, J.M., McGovern, A.P., Vollmer, S.J. and Mateen, B.A. Improving survival of critical care patients with coronavirus disease 2019 in England: a national cohort study, March to June 2020. Critical Care Medicine 2021; 49(2): 209-214.
  • [11] Korber, B., Fischer, W.M., Gnanakaran, S., Yoon, H., Theiler, J., Abfalterer, W., Hengartner, N., Giorgi, E.E., Bhattacharya, T., Foley, B. and Hastie, K.M. Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell 2020; 182(4): 812-827.
  • [12] Li, Q., Wu, J., Nie, J., Zhang, L., Hao, H., Liu, S., Zhao, C., Zhang, Q., Liu, H., Nie, L. and Qin, H. The impact of mutations in SARS-CoV-2 spike on viral infectivity and antigenicity. Cell 2020; 182(5): 1284-1294.
  • [13] Althouse, B.M., Wenger, E.A., Miller, J.C., Scarpino, S.V., Allard, A., Hébert-Dufresne, L. and Hu, H. Superspreading events in the transmission dynamics of SARS-CoV-2: Opportunities for interventions and control. PLoS Biology 2020; 18(11): e3000897.
  • [14] Hawks, L., Woolhandler, S. and McCormick, D. COVID-19 in prisons and jails in the United States. JAMA Internal Medicine 2020; 180(8): 1041-1042.
  • [15] Middleton, J., Reintjes, R. and Lopes, H. Meat plants–a new front line in the COVID-19 pandemic. BMJ 2020; 370: m2716.
  • [16] Adam, D.C., Wu, P., Wong, J.Y., Lau, E.H., Tsang, T.K., Cauchemez, S., Leung, G.M. and Cowling, B.J. Clustering and superspreading potential of SARS-CoV-2 infections in Hong Kong. Nature Medicine 2020; 26(11): 1714-1719.
  • [17] Lau, M.S., Grenfell, B., Thomas, M., Bryan, M., Nelson, K. and Lopman, B. Characterizing superspreading events and age-specific infectiousness of SARS-CoV-2 transmission in Georgia, USA. Proceedings of the National Academy of Sciences 2020; 117(36): 22430-22435.
  • [18] Lemieux, J.E., Siddle, K.J., Shaw, B.M., Loreth, C., Schaffner, S.F., Gladden-Young, A., Adams, G., Fink, T., Tomkins-Tinch, C.H., Krasilnikova, L.A. and DeRuff, K.C. Phylogenetic analysis of SARS-CoV-2 in Boston highlights the impact of superspreading events. Science 2021; 371(6529): eabe3261.
  • [19] Tabachnik, S. Colorado reports 674 coronavirus deaths as state continues to add older fatalities to tally.; 2020-04-24 [Online]. Accessed: 2021-04-22.
  • [20] Kissane E. Daily COVID-19 Data Is About to Get Weird. Accessed: 2021-04-22.
  • [21] Kraemer, M.U., Yang, C.H., Gutierrez, B., Wu, C.H., Klein, B., Pigott, D.M., Du Plessis, L., Faria, N.R., Li, R., Hanage, W.P. and Brownstein, J.S. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science 2020; 368(6490): 493-497.
  • [22] Grenfell, B. and Harwood, J. (Meta) population dynamics of infectious diseases.Trends in Ecology & Evolution 1997; 12(10): 395-399.
  • [23] Watts, D.J., Muhamad, R., Medina, D.C. and Dodds, P.S. Multiscale, resurgent epidemics in a hierarchical metapopulation model. Proceedings of the National Academy of Sciences 2005; 102(32): 11157-11162.
  • [24] Badr, H.S., Du, H., Marshall, M., Dong, E., Squire, M.M. and Gardner, L.M. Association between mobility patterns and COVID-19 transmission in the USA: a mathematical modelling study. The Lancet Infectious Diseases 2020; 20(11): 1247-1254.
  • [25] Xiong, C., Hu, S., Yang, M., Luo, W. and Zhang, L. Mobile device data reveal the dynamics in a positive relationship between human mobility and COVID-19 infections. Proceedings of the National Academy of Sciences 2020; 117(44): 27087-27089.
  • [26] Cartenì, A., Di Francesco, L. and Martino, M. How mobility habits influenced the spread of the COVID-19 pandemic: Results from the Italian case study. Science of the Total Environment 2020; 741: 140489.
  • [27] Glaeser, E.L., Gorback, C.S. and Redding, S.J. How much does COVID-19 increase with mobility? Evidence from New York and four other US cities (No. w27519). National Bureau of Economic Research. 2020.
  • [28] Iacus, S.M., Santamaria, C., Sermi, F., Spyratos, S., Tarchi, D. and Vespe, M. Human mobility and COVID-19 initial dynamics. Nonlinear Dynamics 2020; 101(3): 1901-1919.
  • [29] Cauchemez, S., Carrat, F., Viboud, C., Valleron, A.J. and Boelle, P.Y. A Bayesian MCMC approach to study transmission of influenza: application to household longitudinal data. Statistics in Medicine 2004; 23(22): 3469-3487.
  • [30] Phenyo E.L. and Bärbel F.F. Statistical inference in a stochastic epidemic SEIR model with control intervention: Ebola as a case study. Biometrics 2006; 62(4): 1170-1177.
  • [31] Edmunds, W.J., Medley, G.F. and Nokes, D.J. Evaluating the cost-effectiveness of vaccination programmes: a dynamic perspective. Statistics in Medicine 1999; 18(23): 3263-3282.
  • [32] Macheras, P. and Athanassios, I. Modeling in Biopharmaceutics, Pharmacokinetics and Pharmacodynamics: Homogeneous and Heterogeneous Approaches. New York: Springer New York. 2006.
  • [33] Hao, X., Cheng, S., Wu, D., Wu, T., Lin, X. and Wang, C. Reconstruction of the full transmission dynamics of COVID-19 in Wuhan. Nature 2020; 584(7821): 420-424.
  • [34] Li, R., Pei, S., Chen, B., Song, Y., Zhang, T., Yang, W. and Shaman, J. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science 2020; 368(6490): 489-493.
  • [35] Wölfel, R., Corman, V.M., Guggemos, W., Seilmaier, M., Zange, S., Müller, M.A., Niemeyer, D., Jones, T.C., Vollmar, P., Rothe, C. and Hoelscher, M. Virological assessment of hospitalized patients with COVID-2019. Nature 2020; 581(7809): 465-469.
  • [36] Heffernan, J.M., Smith, R.J. and Wahl, L.M. Perspectives on the basic reproductive ratio. Journal of the Royal Society Interface 2005; 2(4): 281-293.
  • [37] Dukic, V., Lopes, H.F. and Polson, N.G. Tracking epidemics with Google flu trends data and a state-space SEIR model. Journal of the American Statistical Association 2012; 107(500): 1410-1426.
  • [38] Osthus, D., Hickmann, K.S., Caragea, P.C., Higdon, D. and Del Valle, S.Y. Forecasting seasonal influenza with a state-space SIR model. The Annals of Applied Statistics 2017; 11(1): 202-224.
  • [39] Davis, R.A. and Wu, R. A negative binomial model for time series of counts. Biometrika 2009; 96(3): 735-749.
  • [40] Di Loro, P.A., Divino, F., Farcomeni, A., Lasinio, G.J., Lovison, G., Maruotti, A. and Mingione, M. Nowcasting COVID-19 incidence indicators during the Italian first outbreak. Statistics in Medicine 2021; DOI: 10.1002/sim.9004
  • [41] Tian, T., Tan, J., Luo, W., Jiang, Y., Chen, M., Yang, S., Wen, C., Pan, W. and Wang, X. The Effects of stringent and mild interventions for Coronavirus pandemic. Journal of the American Statistical Association 2021;
  • [42] Ma, S., Lian, H., Liang, H. and Carroll, R.J. SiAM: A hybrid of single index models and additive models. Electronic Journal of Statistics 2017; 11(1): 2397-2423.
  • [43] Deng, Y., You, C., Liu, Y., Qin, J. and Zhou, X.H. Estimation of incubation period and generation time based on observed length‐biased epidemic cohort with censoring for COVID‐19 outbreak in China. Biometrics 2020;
  • [44] Ferretti, L., Ledda, A., Wymant, C., Zhao, L., Ledda, V., Abeler-Dörner, L., Kendall, M., Nurtay, A., Cheng, H.Y., Ng, T.C. and Lin, H.H. The timing of COVID-19 transmission. medRxiv 2020; doi: 10.1101/2020.09.04.20188516.
  • [45] Ferretti, L., Wymant, C., Kendall, M., Zhao, L., Nurtay, A., Abeler-Dörner, L., Parker, M., Bonsall, D. and Fraser, C. Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science 2020; 368(6491): eabb6936.
  • [46] Jiang, Z., Yang, B., Qin, J. and Zhou, Y. Enhanced empirical likelihood estimation of incubation period of COVID-19 by integrating published information. Statistics in Medicine 2021; DOI: 10.1002/sim.9026.
  • [47] Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong, Y., Ren, R., Leung, K.S., Lau, E.H., Wong, J.Y. and Xing, X. Early transmission dynamics in Wuhan, China, of novel coronavirus–infected pneumonia. New England Journal of Medicine 2020; 382(13): 1199-1207.
  • [48] Verity, R., Okell, L.C., Dorigatti, I., Winskill, P., Whittaker, C., Imai, N., Cuomo-Dannenburg, G., Thompson, H., Walker, P.G., Fu, H. and Dighe, A. Estimates of the severity of coronavirus disease 2019: A model-based analysis. The Lancet Infectious Diseases 2020; 20(6): 669-677.
  • [49]

    Neath A.A. and Samaniego F.J. On the efficacy of Bayesian inference for nonidentifiable models.

    The American Statistician 1997; 51(3): 225-232.
  • [50] Wechsler, S., Izbicki, R. and Esteves, L.G. A Bayesian look at nonidentifiability: A simple example. The American Statistician 2013; 67(2): 90-93.
  • [51] Stan Development Team. Stan Modeling Language: User’s Guide and Reference Manual. 2017.
  • [52] The New York Times. Coronavirus (Covid-19) Data in the United States.; 2020.
  • [53] SafeGraph.; 2021. Accessed: 2021-01-25.
  • [54] Bien, J., Brooks, L., Farrow, D., Reinhart, A. and Tibshirani, R. covidcast: Client for Delphi’s COVIDcast API. 2021.,
  • [55] Bashir, M.F., Ma, B., Komal, B., Bashir, M.A., Tan, D. and Bashir, M. Correlation between climate indicators and COVID-19 pandemic in New York, USA. Science of the Total Environment 2020; 728: 138835.
  • [56] Frontera, A., Cianfanelli, L., Vlachos, K., Landoni, G. and Cremona, G. Severe air pollution links to higher mortality in COVID-19 patients: The “double-hit” hypothesis. Journal of Infection 2020; 81(2): 255-259.
  • [57] Fronza, R., Lusic, M., Schmidt, M. and Lucic, B. Spatial–temporal variations in atmospheric factors contribute to SARS-CoV-2 outbreak. Viruses 2020; 12(6): 588.
  • [58] Sajadi, M.M., Habibzadeh, P., Vintzileos, A., Shokouhi, S., Miralles-Wilhelm, F. and Amoroso, A. Temperature, humidity, and latitude analysis to estimate potential spread and seasonality of coronavirus disease 2019 (COVID-19). JAMA Network Open 2020; 3(6): e2011834-e2011834.
  • [59] Shi, P., Dong, Y., Yan, H., Zhao, C., Li, X., Liu, W., He, M., Tang, S. and Xi, S. Impact of temperature on the dynamics of the COVID-19 outbreak in China. Science of the Total Environment 2020; 728: 138890.
  • [60] Travaglio, M., Yu, Y., Popovic, R., Selley, L., Leal, N.S. and Martins, L.M. Links between air pollution and COVID-19 in England. Environmental Pollution 2021; 268: 115859.
  • [61] Wu, Y., Jing, W., Liu, J., Ma, Q., Yuan, J., Wang, Y., Du, M. and Liu, M. Effects of temperature and humidity on the daily new cases and new deaths of COVID-19 in 166 countries. Science of the Total Environment 2020; 729: 139051.
  • [62] Zhu, Y., Xie, J., Huang, F. and Cao, L. Association between short-term exposure to air pollution and COVID-19 infection: Evidence from China. Science of the Total Environment 2020; 727: 138704.