1 Introduction
Contact tracing is a primary public health response to infectious disease outbreaks [8, 14]. The adoption of contact tracing to contain COVID19 met considerable challenges, due to the intensive process related to manual contact tracing and the hesitancy for adopting appbased contact tracing tools[2]. Assessing the impact of contact tracing as a control measure remain of great importance.
In [13], the authors derived contact patterns in the UK using a postal and online crosssectional survey. They predicted that, under effective contact tracing, fewer than one in six cases would generate any subsequent infections that will not be traced. This comes at a high cost, with an average of 36 individuals traced per infected case. Making the definition of a close contact more stringent can reduce this cost, but with increased risk of untraced cases.
A mathematical model assessing contact tracing and isolation for COVID19 has been developed in [15]
. The study estimated that a high proportion of cases would need to selfisolate and a high proportion of their contacts successfully traced to contain the epidemic. If combined with moderate physical distancing measures, selfisolation and contact tracing would likely achieve control of SARSCoV2 transmission.
Another aspect of COVID19 that can alter the effectiveness of contact tracing is the high transmissibility of SARSCoV2 before and immediately after symptom onset [7]. Therefore, finding and isolating symptomatic patients alone may not suffice to interrupt transmission, and that more generalized measures might be required, such as social distancing.
Considering the act of quarantining susceptible individuals (uninfected contacts) as the contact tracing cost, a model developed in [17] has shed light on an interesting phenomenon. The number of quarantined susceptible people increases with the increase of tracing because each confirmed case increases the number of total contacts. However, there is an inflection point after which the number of traces decreases with increased tracing because there are fewer confirmed cases.
Models for assessing contact tracing have been developed for many infectious diseases, before the COVID19 pandemic. A systematic review of these models can be found in [16] which, reviewed mathematical models for contact tracing and followup control measures of SARS and MERS transmission. All models required data to estimate specific model parameter distributions. A major concern identified was whether public health administrators can collect all the required data for building epidemiological models in a short period of time during the early phase of an outbreak where contact tracing is more effective.
The analysis of case data from the Wuhan COVID19 outbreak highlighted nonexponential distributions for critical transition times during infection, such as the infectious period and the incubation period
[25, 1]. Simulations show that different distributions of the infectious period duration with the same mean values lead to different epidemic curves.Consequently, it is critical to develop models that can input empirical distributions, often nonexponential, for the transition time between compartments.
The majority of networkbased individual level pathogen spread models assume exponential distributions for transition times among state pairs. However, these models need adaptation to incorporate general transition time distributions. One possible approach is based on the adoption of phasetype distributions [20].
A phasetype distribution can mathematically represent the transition between any two states in a pathogen spread model. By expanding the source state of the original model into a series of states, then any arbitrary distribution can be approximated as a phasetype distribution to represent the time distribution between subsequent states. Techniques exist to estimate the new parameters to realize the desired distribution. When the transition time distribution between the two original states is general, this distribution can be approximated by transitions between subsequent pairs of states among the q states that are exponentially distributed. When the objective is to derive analytical results concerning nonMarkovian models, both firstorder [26] and second order [22] closure techniques can be used. For efficient methods to simulate nonMarkovian models Gillespie based algorithms are available [3].
The goal of this paper is to develop a nonMarkovian model, both exact and firstorder closure approximation, appropriate for assessing contact tracing and to determine the epidemic threshold. A new networked compartmental model with compartments able to represent the disease evolution and the contact tracing process is presented in section 2. The model is general to allow arbitrary transition time distributions and the mean field equations for such a nonMarkovian model is derived. A mathematical analysis was performed to derive the epidemic threshold, in section 3. The model was applied to the Kansas State University case study reported in section 4. Summarizing, the original contributions of the paper are the following:

A novel and general networkbased model structure for pathogen transmission,

The determination of the firstorder epidemic threshold,

Numerical results about the effectiveness of contact tracing in the case study of Kansas State University.
2 Model description
We begin by presenting a continuous time nonMarkovian transmission model to describe the tractability of COVID19 spread when contact tracing strategies are implemented in a population.
The model assumes individuals (nodes) are in one of ten states. If individuals are susceptible (S), they can become infected but not immediately infectious, i.e, exposed (E), through interaction with infectious individuals that are either symptomatic (Is) or asymptomatic (Ia). The model assumes that the infectious asymptomatic state does not contain the presymptomatic cases that present symptoms at some point during the course of infection. Presymptomatic and symptomatic states are combined into the infectious symptomatic (Is) state. Furthermore, the model assumes the fraction of individuals in the E state that moves to the Is state is , and an asymptomatic infectious (Ia) individual cannot be detected unless tested. Therefore, an individual in the Ia state continues spreading virus while infectious. The model assumes that a person in the Ia state is only infectious for a period of time and subsequently move to the removed state RU.
Symptomatic individuals present symptoms and therefore transition to detected state (D) even if not tested. This triggers contact tracing among the contacts. The assumption in this model is some contacts from detected cases get tested, and if they are infected, they move to confirmed traced state (C), and if virus negative, they transition to alert traced (A) state for a period of time. Indeed, testing capacity could be limited, and the contact tracing strategy may only require quarantining of close contacts. In such a case, we can regard the alert state as quarantine or a subset of S but temporarily removed from the susceptible population. Moreover, we can add more states to the model and divide the confirmed (C) state into three substates, to differentiate the disease state of the traced contacts. However, to keep the model simple, the model only considers one C state. Since an individual induces contact tracing as long as is in the D state, after a period of time it should moves to the removed (RD) state so that the contact tracing stops. Similarly, the model assumes an individual in the C state initiates contact tracing and later moves to removed state (RC). We refer to the transmission model described above as SAIDR, and Fig. 1 shows the diagram of the transitions that a person may experience in the SAIDR model. Transition from the S state to the E state is induced by contacts with individuals in the Is or Ia states. Transitions shown by dotted arrows are due to contact tracing and are induced by contacts with individuals in the C or D states. If a transition is not induced by any interaction, we call it nodal transition and it is shown by solid arrow.
In the SAIDR model, we assume the transition times are random variables drawn from appropriate distributions for the state. The traditional approach in the analysis of epidemic models assumes the distribution of transition times to be exponential, but this might not be an accurate description of all the processes considered in our spreading model. For instance, realworld data suggest transition times for the E
I process are not distributed exponentially. Some factors that have been suggested to potentially influence these transition times, specifically the incubation period (E → Is), include age, infectious dose, and physiological stressors.Moreover, allowing nonexponential distributions for auxiliary processes such as D
RD, increases the degrees of freedom in the model. Hence, here we allow the nodal transitions to have nonexponential distributions. An exponential distribution is specified by the rate parameter
, and the assumption that a transition time is exponentially distributed implies for any infinitesimal period of timethe transition happens with a constant probability
, regardless of the age of the process. Indeed, such a constant rate of transition is a suitable approximation for the transmission process. Therefore, for the transmission process S E, the model assumes the transition times are exponentially distributed as long as the infecting individual stays infected. Additionally, if a susceptible person has several infectious contacts, the model assumes the infecting processes are independent. Similarly, in the contact tracing modeling, we use exponential distribution for the transition times of processes that change state of an individual to the C or A states.Finally, to model the interactions that result in virus transmission or lead to contact tracing, we use an undirected network where the nodes represent individuals in the population and the links denote the contacts among them. Consider a network , with representing the set of nodes, a set of links between the nodes and weight functions defined over the links. We use the weight functions to quantify heterogeneity of the contacts in relation to virus transmission probability or contact tracing probability, respectively. In other words, for each contact, we modify the rate of exponential distributions for the infecting and contact tracing processes multiplying the rates by the contact weights. This allows us to define various types of contacts with different probabilities for virus transmission or contact tracing.
3 Mathematical analysis of the model
In this section we develop a set differential equations that approximately describes the behavior of the transmission model discussed in section 2. Indeed, there is an extensive body of research concerning Markovian spreading unfolding over networks [27, 24, 21, 18, 4, 6, 11], and it is shown the exact mathematical treatment of such a system is not tractable because we need to follow the joint state of all the nodes. Therefore, a meanfield type approximation, which assumes statistical independence of neighboring nodes’ states, is often employed to study network spreading models. Here, we limit the nonexponential random variables in the spreading model of 2 to the Erlang family of distributions. This enables us to use a characteristic of Erlang distribution to cast our spreading model into a Markovian model, for which we can develop an Nintertwined set of differential equations [27].
The Erlang distribution is a twoparameter family of continuous probability distributions with the density function
(1) 
where the shape parameter is a positive integer, and is called the rate parameter. From equation above, we can see that the exponential distribution is a special case of Erlang distribution with
, and the mean and variance of Erlang distribution are
and , respectively. Since the Erlang distribution has two parameters, we can adjust them to obtain a density function with a desired mean and a variance close to a target value. Thus, we can use the Erlang distribution as an approximation for a large class of distributions.One important characteristic of the Erlang distribution can be stated as follows: distribution of sum of independent random variables, each having an exponential distribution with rate , is Erlang distribution with the shape parameter and the rate parameter , i.e.
This characteristic implies if the random transition time of a process follows Erlang() distribution, the process can be modeled as successive transitions between auxiliary states where each transition time is exponentially distributed with the rate . Therefore, if a nodal transition time in the spreading model of Fig. 1 has an Erlang type distribution, it is possible to replace the nodal transition with a set of successive Markovian transitions and obtain an equivalent spreading model. Fig. 2 shows a spreading model where all the transition times are exponentially distributed and it is equivalent of the original model in Fig. 1. Indeed, even if the distribution of a transition time is not Erlang, we still can use phasetype distribution method to obtain a mixture of Markovian transitions that is approximately equivalent to the original transition [20]. However, we limit the distribution of nodal transition times to Erlang distribution because it incorporates a family of density functions with arbitrary means and variances.
Considering the Markovian spreading model in Fig. 2, let represents state of node among possible states in the model, and
represents the joint state of all the individuals in the population. Based on the nodal description of the processes, we deduce the joint state is a continuoustime Markov chain over a space consisting of
possible network states. Therefore, the exact mathematical treatment of the system is not tractable when the number of nodes is large. But if we use meanfield approximation [27], that assumes statistical independence of neighboring nodes’ states, we obtain a set of equations, which we can solve computationally. This approximation assumes at any time, the joint probability of finding nodes and in the states and can be written as the multiplication of marginal probabilities, .To formulate the approximate behavior of the Markovian model depicted in Fig. 2, we use dynamic variables, , which represent the probabilities of finding node in the corresponding states. If and denote the rates for the transmission and contact tracing processes exerted on node , we have
(2) 
where is the rate for the the contact tracing process, and we have assumed symptomatic and asymptomatic infectious states have different infectiousness which is reflected in the infectious rates parameters, , . Finally, using the joint state independence approximation, we can write the following set of equations for all the nodes in the network
(3) 
The system of nonlinear ordinary differential equations (ODEs) above, describes evolution of nodal states’ probability. We can use this set of equations to study behavior of the spreading processes or to estimate the model parameters. In addition, stability analysis of diseasefree state of the system leads to derivation of epidemic threshold which establishes a condition that determines whether an initial infected population will vanish or has the possibility to grow.
To derive the epidemic threshold the first step is to linearize the nonlinear system 3 about the diseasefree steady state. In the diseasefree state the probability of finding nodes in any state other than the susceptible state is zero. After linearization we arrive at an independent subsystem that describes production of new infections. Since the variables in this subsystem drive other variables in the larger system, stability of this subsystem determines stability of the larger system. This subsystem can be written as follows:
(4) 
Theorem 1
Assuming the infection weight matrix is irreducible, the diseasefree steady state is a locally stable fixed point of the dynamical system 3 if the following condition holds:
(5) 
where denotes the spectral radius of the weight matrix .
To derive the threshold condition in equation 5, we rewrite the linearized subsystem 4 in a matrix form where the infection processes, represented by a matrix , are separated from other transitions,
(6) 
The square matrix can be written as the Kronecker product of the network weight matrix and a matrix which represents the rates and the driving nodal states in the infection process,
(7) 
where and are matrices of zeros and ones, respectively,
(8) 
The matrix in equation 6
is a block diagonal matrix which can be written as the Kronecker product of the identity matrix of size
and a matrix which represents nodal transitions,(9) 
and the matrices used in the definition of have the following structures
In the matrix, the diagonal elements are , the lower diagonal elements are and the rest of elements are zero. In the matrix, the element in the upperright corner is and the rest of elements are zero.
The stability of steady state of the linear system 6 is determined by the spectral bound of the square matrix , defined as
where
denotes the set of eigenvalues of
. The linear system is exponentially stable if and only if the real parts of the eigenvalues are negative (i.e., if ). To find the condition that leads to a negative spectral bound we apply theorem A.1 from reference [9], which, for the sake of completeness, we state as the following theorem.Theorem 2
If is a positive matrix and is a positive offdiagonal matrix with , then
where and sign(.) denote spectral radius and sign functions, respectively.
We note that in reference [9], positive matrices are defined as nonzero matrices with all entries nonnegative; and a matrix is defined as positive offdiagonal if all entries are nonnegative except possibly those on the diagonal.
To apply theorem 2, we first investigate if the matrices and satisfy the condition stated in the theorem. From the definition of and in the equations 7,9, we can see is a positive matrix and is positive offdiagonal. Moreover, since is positive offdiagonal lemma 6.12 in [10] shows that if and only if is invertible and is a positive matrix. Indeed, we can directly calculate as fallows
(10) 
where the matrices and are defined in the equation 8, and is a lower triangular matrix of size , with nonzero entries equal 1
(11) 
It is clear that is a positive matrix. Therefore we can use theorem 2 to conclude that
Finally to prove theorem 1 we will obtain an expression for . Using the properties of the Kronecker product we can write
(12) 
where denotes set of eigenvalues. After calculating the matrix , we can see only the first row is nonzero
and therefore the eigenvalues of the matrix are and . In addition, since we assume is an irreducible nonnegative weight matrix, Perron–Frobenius theorem shows has a positive eigenvalue, denoted by , and absolute value of any other eigenvalue of is strictly smaller than . Combining these results about the eigenvalues of and with the equation 12, we find that
(13) 
which leads to the proof of theorem 1.
To numerically investigate the threshold condition, we computed prevalence of the exposed state (E) through time for a spreading unfolding on the largest component of the coauthorship network [19]. We calculated the prevalence of a state as the average of nodes’ probabilities. Fig. (a)a shows the prevalence of exposed state for different values of when we assumed no contact tracing. Fig. (b)b shows similar prevalence when there is contact tracing. In both figures, we can see the exposed state exponentially dies out when the spreading is below the threshold. To compere the size of epidemic for different values of , we have plotted the R state prevalence in Fig. (c)c and (d)d.
The threshold condition we have derived, only depends on the network structure, infection transmission rates, probability of becoming a symptomatic case, and expected infectious periods for the symptomatic and asymptomatic states. This is clear from equation 13, if we notice that and are the expected values for the corresponding Erlang distributions. Although the variances for the infectious period distributions do not change the threshold condition, they affect the prevalence of the states as shown in Fig. 4. To generate this figure, we calculated the prevalence curves for two spreading processes unfolding on the network of the previous example. In the first process, we assumed the symptomatic and asymptomatic infectious periods are distributed exponentially with expected values of four and six, respectively. For the second process, we changed the exponential distributions to Erlang distributions with similar expected values but variance of 2.
4 Case study
In this section the we apply the SAIDR spreading model to COVID transmission among Kansas State University (KState) students in Manhattan, Kansas. Such a model can be used to evaluate the effectiveness of different strategies for reducing the spread of virus or to predict the size of epidemic. However, this requires information about the network structure and other model parameters. We performed a survey among individuals associated with the university and the survey results were used to build a random contact network. Assuming this contact network for the population, we use weekly positive COVID cases to estimate unknown model parameters such as the transmission rate . Later, we use the estimated parameters to study the effectiveness of contact tracing among the students.
4.1 Contact Network
To build a weighted contact network we sent an online survey to all Kansas State University students and staff via the university webmail system in December 2020. This survey asked about participation in social interactions during the 2020 fall semester (August 2020December2020). During this semester, some of the classes were held inperson and some held online. Students were present in Manhattan, Kansas, during this time and a mask mandate was in effect in all the facilities related to the university and in community public spaces.
We asked questions about housing status, age and role in the university, as well as the number of close contacts such as roommates, family members, or coworkers. In another question we also asked for the number of people they regularly meet in close proximity for a total contact duration of less than four hours per week (such as friends). The responses to this question were used to build a second layer of contacts that can be traced but have lower transmission probability. To find the level of social interaction we asked the respondents to specify average number of visits per week to different public locations such as bars, restaurants, coffee shops and stores. We also asked about the frequency of their participation in social events such as religious or sports events. For each public location type, the respondents also indicated duration of the visits and number of people with whom they interact in close proximity.
After processing the data, based on their housing type and role in KState, we divided the respondents into six groups. These groups are: (1) graduate students (2) factually and staff, (3) undergraduate students living in off campus apartment or houses, (4) undergraduate students living in fraternities and sorority houses (Greek houses), (5) and (6) undergraduate students using two available oncampus housing options. The rationale for this division is that the respondent age and housing type possibly leads to different levels of social interaction and number of close contacts. Using the survey data, we calculated the following parameters for each group

, average number of close contacts

, average number of people met for total duration of less than four hours per week
and for the three types of public spaces which are (1) bars, restaurants, and coffee shops, (2) stores and services, and (3) social events

, proportion of individuals in group who visits public space type , and , average weekly hours they spent in these locations, and , average number of people they encounter
Values of these parameters are presented in the table I.
group #  1  2  3  4  5  6 
3.5  3.5  5.8  14.3  4.5  3.5  
3.5  2.3  4.9  10.7  5.1  4.2  
location 1  
0.5  0.4  0.6  0.73  0.53  0.5  
1.76  0.7  3.25  4.46  2.25  1.5  
4  2.7  6  5.3  5.3  3  
location 2  
0.91  0.92  0.89  0.81  0.75  0.91  
2  1.7  2.6  2.1  1.5  3  
4.5  4.2  4.9  4.3  4.7  4.7  
location 3  
0.3  0.26  0.45  0.72  0.56  0.33  
0.85  0.7  1.72  4.5  3  2.37  
6  5.5  8  14  8.6  4.9 
Using these parameters, we built three layers of networks for the whole population comprising the groups mentioned before and an additional group which is the rest of the town population. Since we conducted the survey only among the individuals associated with university, we did not have the parameter values for the general public not associated with the university. Hence, we extended the parameters extracted for the group of faculty and staff to that additional group. Population of different groups that we assumed in our calculations are given in the table below.
group #  1  2  3  4  5  6 

4000  31000  8700  2000  4400  900 
For layer one, , which is the layer of close contacts, consider clusters of individuals within the main groups. We assume among close contacts for each individual, a fraction of the contacts happens within the cluster the individual belongs to, and the remaining contacts are randomly established using configuration model. While setting up the random links we assumed undergraduate students have only links with other undergraduate students and graduate students with other graduate students.
For layer two, , we used configuration model where the node degree of an individual in group was set to .We assumed undergraduate students have only links with other undergraduate students and graduate students with other graduate students. To decrease the number of model parameters, we the express daily infection transmission rate in this layer in terms of transmission rate through the close contacts of layer , which we denote by . If we assume the effective daily contact duration in is only eight hours and compare that with weekly duration of links in , which is around four hours, we expect a daily transmission rate of for the links.
For the layer three, , which represents interaction through public spaces, we considered a complete graph over the whole population and weighted the link between any two nodes and by
In this expression, enumerates three different public spaces we mentioned before, represents group assignment of node and is the total population. is a coefficient that when is multiplied by , gives an estimate of infection transmission rate between nodes and in public space , assuming is transmission rate for a close contact link in the layer . Indeed, if is the total hours per week that the public space is active is daily expected hours that nodes and overlap in such a location. Moreover, if we assume the effective daily duration of close contact is eight hours, gives an estimate of daily infection transmission rate between nodes and in the public space . Here, we use , , .
4.2 Model Parameters
For some of the parameters in the spreading model described in section 3, we use the values below in our calculations.
Following reference [12], we assume the proportion of infected individuals that never show symptoms is , and their infectiousness is lower than those who develop symptoms such that .
For transition from the E state to the I states, we use days, and . These values lead to a distribution with the mean of
days and the standard deviation of
(Fig. 5). We chose this distribution using the data in reference [23], where authors report the distributions for the incubation period ( i.e., time from exposure to symptoms onset) and the serial interval which is time between the symptoms onsets of successive cases in a chain of transmission. In fact, the reported negative serial interval implies presymptomatic transmission of COVID19 during the incubation period. To estimate the E state period in our model, we assumed that, in the incubation period, infected individuals go through several stages with the same expected time, and in one of the stages they start transmitting the infection. If this stage is before the onset of symptoms, we will observe negative serial intervals. Hence, we approximated the incubation period with an Erlang distribution with and days, which has the same median and 95th percentile as the reported distribution in [23]. Next, we simulated a chain of transmission assuming transmission starts at a specific stage of incubation period with a specific transmission rate, and we recorded the distribution of the serial interval. By exploring different values for the stage that the transmission starts, and also the transmission rate, we found that the simulated distribution of the serial interval is closest to the reported one in [23], if the transmission starts at the stage four. Therefore, we used an Erlang distribution with and days for the period of E state in our model.Considering the transition from the state to the RU state, we assumed an Erlang distribution of mean six days and variance of two days. Fig. 5 shows the corresponding density function. This value of variance implies that there is almost no transmission after day nine [5] and there is transmission in the first week when the viral load is high [5].
Here we use similar distributions for the random times of the of the transitions C RC and D RD. If we assume and days, and , then the probability that a node in or state induces a contact tracing transition in its neighbor is . We calculate the contact tracing success probability as
(14) 
We can adjust this probability by changing or . For instance, if we change to only , the probability becomes . In our calculations, we assume the rate of success for contact tracing through the layer of close contacts, is high, and through the layer is lower, therefore, we chose for the layers and , respectively. We need to note that the duration of staying in the C or D state cannot be long, otherwise these states may induce repetitive contact tracing in the model.
The other transition that we specify its random time distribution is A S. Here, we set the and days. These values lead to a distribution that is concentrated between days 4 and 11 with a mean of 7 days (Fig. 5).
4.3 Approximate Model
Practically, we can solve the network spreading model ODEs of the system 3, even if the population is large. This system of equations is applicable for any network with an arbitrary structure. However, the threelayer network we described in section 4.1, to some extent, is homogeneous. Specifically, the nodes that belong to a same group have similar type and number of links in the layers and , and their community contacts through the layer
are identical. Hence, we expect that the probability vectors of the nodes with a similar group assignment will be similar, and we may use one probability vector for all the nodes in a group. Within this approximation, the dimension of the dynamical system that describes the spreading process is smaller, and each group of nodes is represented by only one node. The ODEs for this system are similar to those in the equation
3, except that the superscript for the variables now enumerate the groups and run from to . To have a closed system of ODEs, we also need to approximate the infection and tracing rates, and in term of the probability vectors of the groups. The infection rate of a node in group can be written as(15) 
where determines the contribution of group in the infection rate of a node in group and , are infection transmission rates through close contacts of the layer . Considering definition of the network parameters in section 4.1, is approximated by
(16) 
where is the Kronecker delta function, is the population of group and is the ratio of infection transmission rate for the contacts in the layer to that of the layer . Furthermore, , if existence of links between the nodes in groups and is allowed in the network layers and , otherwise . The first, second and third lines in the equation above give the contribution of the network layers , and in the infection rate, respectively. Also, the contact tracing rate for the nodes in group , represented by , can be approximated as
(17) 
where is the tracing rate through the close contacts in the layer and
(18) 
In the equation above, is the ratio of tracing rate through the layer to , which we set at .
To compare the approximated spreading model with the high dimension network model, we calculated the total population of students in the D, C, RD, RC states using both models assuming similar initial condition and the contact network parameter . Fig. 6 shows this population for different values of and . In this figure, the points shown by square markers were calculated using the network spreading model equations and the line plots are the result of the approximate spreading model. We can see, for the type of contact networks we defined in this section, the approximate model generates epidemic curves comparable to those obtained by the network spreading model.
4.4 Estimation of Effectiveness of Contact Tracing
In this section we use the reported positive COVID cases to estimate the effectiveness of contact tracing among the students during the 2020 fall semester. We assume the infection spreading follows our spreading model, and the contact network and the parameters’ values are those discussed before in sections 4.1, 4.2. Since we do not know the value of infection transmission rate and the distribution of infectious period for the symptomatic state, Is, we use a Markov chain Monte Carlo (MCMC) scheme to estimate these unknown parameters and eventually obtain an estimation for the effectiveness of the contact tracing. Although there are some published results regarding these parameters, we believe they cannot be extended to all populations. For instance, the distribution of the symptomatic infectious period, by which we mean the period an infectious symptomatic is spreading infection before removal from the population, depends on the level of population vigilance and testing.
Here we use the Metropolis–Hastings algorithm, which is a MCMC method, to estimate the spreading model parameters. In general, this algorithm generates samples of a random variable for which we can calculate the probability ratio between the samples. Later, these samples can be used for calculating numerical approximations of functions of the random variable. To detail the algorithm, consider a kdimensional random vector , with a probability distribution proportional to . The algorithm generates a sequence of random samples following the steps below

Choose an initial sample , and set

Generate a proposal sample , such that each
is normally distributed with a mean of
and a predefined standard deviation, i.e., 
Calculate acceptance probability,

Generate a uniformly distributed random number in the interval
. 
If accept as the next sample, , otherwise set .

Set , and go back to step 2.
In order to adopt this algorithm in our estimation problem, we need to define the function for our problem. Let and denote an observed spreading data series and the corresponding outputs of spreading model. For a vector of spreading model parameters, , we assume , where is the sum of absolute deviation of the spreading model outputs, . Using this function, the fitting algorithm generates a sequence of samples from the parameter space such that the density of samples in a region with the error is four times the density in a region with the error . In the definition of , we have used the sequences and , which we need to specify. Since we had access to the weekly new positive COVID cases among the students, we use the weekly cumulative cases from the start of the fall semester as the sequence of observed spreading data . The corresponding sequence from the model, , is the weekly population of students in the states D, DR, C, CR. Regarding the spreading model, we use the approximate model in section 4.3 because it is computationally fast and its outputs are similar to those calculated by the network model ODEs 3. Concerning the open parameters in spreading model, we assumed the parameter vector consists of the following components:

Mean and variance of the infectious period for the symptomatic state. Assuming these parameters, we find an Erlang distribution with same mean and a variance closest to the parameter value, and we use this distribution in the model.

Transmission rate , which we allow to change every 15 days. Effectively, through the course of spreading, average in every 15 days is represented by one component of .

Timepoint of the first observation in the sequence with respect to the initial day of the model calculation. We solve the ODEs for the model with the initial E state probability equal to .

Parameters which is defined in section 4.1, and relates to the structure of the close contact network layer.
Following the procedure we explained in this section, we generated sequences with 1 million samples of the open parameters. Fig. (a)a shows a histogram of the samples’ absolute error . The output of the spreading model for a sample with the minimum error of 330 is shown in Fig. (b)b where we have plotted the population of students in the D, C, RD and RC states and compared it with the reported students’ positive cases.
To estimate the effectiveness of contact tracing, we set the contact tracing rate in the model to zero, , and for each sample of the open parameters we recalculated the total population of infected students at the day of last observation. Next, we calculated the ratio of this population to the same population in the original model where , and we use this ratio to quantify effectiveness of contact tracing. Histogram of this effectiveness is shown in Fig. (c)c. When we examine the distribution of samples error in Fig. (a)a, we notice the distribution has a long tail. This is because, in the sampling algorithm, we chose . In fact, we could instead use an exponential function that would not generate a long tail distribution, but the resulting Markov chain could possibly stay in a local minima for a long time. To eliminate the effect of samples with large errors, in our estimation of parameters, we only use samples with error smaller than . This effectively truncates the original distribution. Fig. (d)d shows the distribution of contact tracing effectiveness when we only use samples with error less than . The mean for this distribution is 3.35 which is slightly different from that of the distribution in Fig. (c)c which is 2.94.
Moreover, from each sample we can extracted for the corresponding Erlang distribution of the infectious period. The histogram of is plotted in the Fig. (b)b. This shows the most probable distribution for the duration of the Is state is exponential. We have also shown the distribution of the infectious period mean for the Is state in Fig. (a)a. The mean for this distribution is days which is significantly shorter than the infectious period for the Ia state. The reason for this might be the vigilance of population and availability of testing. Fig. (c)c shows the distribution for the network parameter , which has mean of . We have also shown the Interquartile range and the estimated median for in Fig. (d)d. In this plot we can see a range of values for , but if we assume , the factor in the threshold equation 13 becomes . Considering the fact that the largest eigenvalue of a network is larger than its minimum node degree, we can deduce that if the individuals in a population have more than four close contacts the threshold condition for the epidemic dieout will not satisfy. This is especially significant because in the calculation we have assumed the mean of infectious period for the Is state is only 1.5 days which is very small.
5 Conclusions
In this work, we have developed and investigated a novel nonMarkovian networkbased model to assess the effectiveness of contact tracing and shed light on conditions to contain a COVID19 outbreak. In particular, using equation 5 for the epidemic threshold, it is clear the quantitatively relevant impact of the asymptomatic infected individuals in the spreading process. Furthermore, when considering homogeneous contact patterns, even limiting the number of contacts to a very small number (i.e.,45) per individual will not be enough to contain the epidemic. We have evaluated contact tracing with a twolayer network, a regularcontact layer and a casualcontact layer. Extensive simulations show that contact tracing can be very effective by reducing the size of the epidemic more than threefold. This result takes into account that 90% of the regular closed contacts are traced and only 30% of the casual contacts are traced. In our model, the duration of the symptomatic and asymptomatic infection periods are evaluated at 1.5 days mean for symptomatic individuals due to prompt isolation, while at 6 days for asymptomatic notdetected ones. Since the threshold equation is derived using a meanfield firstorder closure approximation, future work will explore higher order closure approximations evaluating the tradeoff between improved accuracy and increased costs. Overall, we have proposed a nonMarkovian model for evaluating SARSCoV2 transmission containment strategies. We believe these model types offer more flexibility to incorporate any type of transition time distributions, hence improving the model’s reliability for SARSCoV2 and other pathogens.
Acknowledgments
This material is based on work supported by the National Science Foundation under Grant No.2027336 IIS.
References
 [1] (2020) Incubation period of 2019 novel coronavirus (2019ncov) infections among travellers from wuhan, china, 20–28 january 2020. Eurosurveillance 25 (5), pp. 2000062. Cited by: §1.
 [2] (2020) Effect of manual and digital contact tracing on covid19 outbreaks: a study on empirical contact data. Journal of the Royal Society Interface 18 (178), pp. 20201000. Cited by: §1.
 [3] (2014) Simulating nonmarkovian stochastic processes. Physical Review E 90 (4), pp. 042108. Cited by: §1.
 [4] (2003) Absence of epidemic threshold in scalefree networks with degree correlations. Physical review letters 90 (2), pp. 028701. Cited by: §3.
 [5] (2020) SARScov2, sarscov, and merscov viral load dynamics, duration of viral shedding, and infectiousness: a systematic review and metaanalysis. The Lancet Microbe. Cited by: §4.2.
 [6] (2008) Epidemic thresholds in real networks. ACM Transactions on Information and System Security (TISSEC) 10 (4), pp. 1. Cited by: §3.
 [7] (2020) Contact tracing assessment of covid19 transmission dynamics in taiwan and risk at different exposure periods before and after symptom onset. JAMA internal medicine 180 (9), pp. 1156–1163. Cited by: §1.
 [8] (2020) COVID19 contact tracing: challenges and future directions. IEEE Access 8, pp. 225703–225729. Cited by: §1.
 [9] (2010) The construction of nextgeneration matrices for compartmental epidemic models. Journal of the Royal Society Interface 7 (47), pp. 873–885. Cited by: §3, §3.
 [10] (2000) Mathematical epidemiology of infectious diseases: model building, analysis and interpretation. Vol. 5, John Wiley & Sons. Cited by: §3.
 [11] (2012) Localization and spreading of diseases in complex networks. Physical review letters 109 (12), pp. 128702. Cited by: §3.
 [12] (2021) SARScov2 transmission from people without covid19 symptoms. JAMA network open 4 (1), pp. e2035057–e2035057. Cited by: §4.2.
 [13] (2020) Efficacy of contact tracing for the containment of the 2019 novel coronavirus (covid19). J Epidemiol Community Health 74 (10), pp. 861–866. Cited by: §1.
 [14] (2020) Impact of delays on effectiveness of contact tracing strategies for covid19: a modelling study. The Lancet Public Health 5 (8), pp. e452–e459. Cited by: §1.
 [15] (2020) Effectiveness of isolation, testing, contact tracing, and physical distancing on reducing transmission of sarscov2 in different settings: a mathematical modelling study. The Lancet Infectious Diseases 20 (10), pp. 1151–1160. Cited by: §1.
 [16] (2019) Epidemic models of contact tracing: systematic review of transmission studies of severe acute respiratory syndrome and middle east respiratory syndrome. Computational and structural biotechnology journal 17, pp. 186–194. Cited by: §1.
 [17] (2021) Contact tracing evaluation for covid19 transmission in the different movement levels of a rural college town in the usa. Scientific reports 11 (1), pp. 1–12. Cited by: §1.
 [18] (2002) Spread of epidemic disease on networks. Physical review E 66 (1), pp. 016128. Cited by: §3.

[19]
(2006)
Finding community structure in networks using the eigenvectors of matrices
. Physical review E 74 (3), pp. 036104. Cited by: §3.  [20] (2015) A general class of spreading processes with nonmarkovian dynamics. In 2015 54th IEEE Conference on Decision and Control (CDC), pp. 5073–5078. Cited by: §1, §3.
 [21] (2015) Epidemic processes in complex networks. Reviews of modern physics 87 (3), pp. 925. Cited by: §3.

[22]
(2015)
Exact and approximate moment closures for nonmarkovian network epidemics
. Journal of theoretical biology 382, pp. 160–177. Cited by: §1.  [23] (2021) Evidence for presymptomatic transmission of coronavirus disease 2019 (covid19) in china. Influenza and other respiratory viruses 15 (1), pp. 19–26. Cited by: §4.2.
 [24] (2013) Generalized epidemic meanfield model for spreading processes over multilayer complex networks. IEEE/ACM Transactions on Networking 21 (5), pp. 1609–1620. Cited by: §3.
 [25] (2020) High contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2. Emerging infectious diseases 26 (7), pp. 1470. Cited by: §1.
 [26] (2018) Meanfield models for nonmarkovian epidemics on networks. Journal of mathematical biology 76 (3), pp. 755–778. Cited by: §1.
 [27] (2011) The nintertwined sis epidemic network model. Computing 93 (24), pp. 147–169. Cited by: §3, §3.