1 Introduction
Mathematical models of the dynamics of directly transmitted infectious diseases can provide predictions about the future course of an ongoing epidemic and hence aid in decision making and epidemic control (e.g. siettos2013mathematical). Statistical time series methods are utilised predominantly for epidemic nowcasting, i.e., shortestterm predictions and current state assessment under not yet complete data (e.g. bastos2019modelling; mcgough2020nowcasting) and epidemic surveillance, i.e., early identification of emerging epidemics (unkel2012statistical). Mechanistic/statespace models, which are based on a mathematical description of the spreading process, allow one to make predictions about the future course of an epidemic (shaman2012forecasting; tizzoni2012real; nsoesie2013forecasting) as well as simulating intervention strategies (chao2011planning; di2021optimal; di2020impact; van2020covid). Many such models rely on a compartmentalisation of a population of individuals according to the individual’s disease status. In diseases for which there is no immunity upon recovery, each individual is either susceptible (S) or infected/infectious (I) at any given time.
One assumption common to many compartmental models is that of random mixing of individuals, or of/within subgroups of a population (e.g., kermack1927contribution; jacquez1993stochastic). While this assumption can be adequate in some instances (e.g., within households; goeyvaerts2018household), it is known that populations do not mix at random in general. Rather individuals have a finite set of contacts to whom they can pass on an infection. It is wellestablished that the contact network of a population significantly impacts epidemic dynamics (e.g., shirley2005impacts; yin2017impact). The importance of contact structure for epidemic dynamics has led to a close interaction between network science and mathematical epidemiology (keeling2005networks; danon2011networks; pastor2015epidemic; kiss2017mathematics) whereby the spread of an epidemic within a population is understood and modelled as a stochastic process on a network. In such a model, each individual in the population corresponds to a node in the network, and a contact that represents a potential route for disease transmission between two individuals is a link in the network.
Drawbacks of network epidemiological models typically include their high dimensionality and the inaccessibility of the exact contact network of a population. Consider a SISepidemic on an undirected, unweighted network with
nodes. At any given time, each node is either susceptible or infected/infectious. If the exact contact network is static and known, a complete description of the SIS dynamics is given by a continuoustime Markovchain of dimension
(one equation for each possible network state; e.g., simon2011exact). Such a Markovchain model is exact, but also highdimensional even for modest values . Hence, the numerical integration of the system of equations becomes unfeasible for most reallife networks. Consequently, analytical results based on the exact system are mostly out of reach, and existing results typically rely on meanfield approximations (e.g., mata2013pair; cota2018robustness). Further, the exact contact network of a population is rarely accessible, but usually needs to be approximated either from limited observations and/or based on theoretical network models (e.g., della2020network; xue2020data).In this study, we explore the suitability of a computationally inexpensive model to describe the stochastic process of an SIS epidemic spreading on Regular (Reg), Erdős–Rényi (ER) and BarábasiAlbert (BA) networks. The surrogate model utilised in this study was first introduced in di2020network and further expanded to include more network classes and consider the large limit in di2020pde. The core idea of the approach is a dimension reduction of the state space. In the surrogate model, the state of the epidemic at any given time is defined by the total number of infected nodes in the population. The effect of the contact structure on the spreading of the epidemic is accounted for by the model parameters. The continuoustime Markovchain describing the SIS dynamics on the reduced state space takes the form of a BirthandDeath (BD) process and is of dimension ; that is, it is linear in and thus feasible also for large . The parameters of the BD model correspond to recovery and infections rates. The recovery rate is networkindependent and here assumed to be known. The rate at which new infections occur depends on the network, but for particular network classes it can be well described by a threeparameter model, reducing the number of free parameters of the BD model from to only three. di2020network utilised the finding that different types of networks are associated with distinct regions in the space spanned by the three parameters to infer the type of network from populationlevel observations. To solve this inverse problem di2020network
set up a Bayesian inference procedure and built network class specific prior distributions which then allow to identify the most likely network class from the posterior.
Here, we utilise the BD model to forecast the evolution of an ongoing epidemic. We address the following questions:

How well does the BD model capture the intrinsic stochasticity of an epidemic spreading on a network?

How uncertain are model parameters when inferred from the kind of timecensored observations typically available in a realistic prediction scenario, and how does this uncertainty translate into prediction uncertainty?

Can we use the BD model and Bayesian inference to provide epidemic forecasts with meaningful uncertainty information?
The manuscript is structured as follows: Section 2 introduces the BD model and Bayesian inference. We then outline the generation and evaluation of predictions using the BD model. We consider nine different combinations of networks and epidemic parameters: a small, a medium and a large epidemic on a network from each of the three aforementioned network classes (Section 3). An empirical validation of the BD model based on these nine cases is provided in Section 4.1. In Section 4.2, we evaluate the predictions obtained with the BD model for all nine cases. In particular, we study the sensitivity of network class and parameter inference on the number and timing of observations in realistic prediction scenarios and how uncertainty about network class and model parameters translates into prediction uncertainty. We conclude with a discussion including limitations of this work and future directions.
2 Methods
2.1 SIS epidemics on networks
We consider the standard SIS epidemic on a population of individuals whose contact structure is described by an undirected and unweighted network defined by its adjacency matrix with and if individuals (nodes) and are connected and otherwise. If two nodes and are connected, the disease can be transmitted from one to the other. In an SIS epidemic, each node is, at any given time, either susceptible (S) or infected/infectious (I). Thus, the epidemic state of the network at time
is described by a Boolean vector
with where if node is susceptible and if node is infected at time . Hence, there exists a total of distinct network states. The state of the network changes through two types of events: the recovery or the infection of a node. Infection and recovery are Markovian and act as homogeneous Poisson point processes with constant perlink infection rate and constant recovery rate . An infectious node can spread the infection only to neighbouring susceptible nodes. Infection dynamics thus depends on the network structure, while recovery is networkindependent. A complete description of the SISdynamics on a given network corresponds to a Markovchain over a statespace of dimension for which numerical integration becomes intractable even for modest values of . However, given network adjacency , epidemic parameters and , and initial conditions , realisations of the stochastic process can be readily obtained using the Gillespie algorithm (e.g., gillespie1977exact; kiss2017mathematics). This is computationally comparatively inexpensive and provides us with i.i.d. samples of the true stochastic process. Such samples serve as a reference for the validation of the BD model and for the evaluation of predictions. More precisely, we make use of the aggregated number of infected nodes in the network, i.e., , which describes the epidemic at population level.2.2 BD model
Birthanddeath processes are intuitively linked to the populationlevel dynamics of SIS epidemics (e.g., ganesh2005effect; nagy2014approximate; devriendt2017unified). In this view, an increase in the number of infected individuals () corresponds to the ’birth of an infection’; a decrease () to the ’death of an infection’. Accordingly, the epidemic state is defined by the number of infected nodes in the network. The resulting model is, like the exact formulation (Sec. 2.1), a continuoustime Markov chain, but on a state space of dimension only.
The Kolmogorov (or Master) equation of a standard BD process is given by
(1) 
where
denotes the probability of observing
infected nodes at time , and and denote populationlevel infection and recovery rate, respectively. We note that . The populationlevel recovery rate can be directly obtained from the node recovery rate as . The populationlevel infection ratehowever depends on the number of links between susceptible and infected nodes (SI links) present in the network in its current state and is thus a random variable depending on the precise network. Following
di2020network, is represented in the BD model by its expectation the timeaveraged number of SI links over the network states with infected nodes. di2020network further demonstrated that for Regular, Erdős–Rényi and BarabásiAlbert networks can be well represented by a threeparameter model of the form(2) 
with serving as a general scaling parameter, allowing to shift the peak of the curve with respect to and adjusting the flatness of the curve. The shape of the curves is network classspecific (Fig. 1). Whilst the peak is located near the centre () for Erdős–Rényi networks, it is shifted to the right for Regular networks, and to the left for BarabásiAlbert networks. Accordingly, the triplets for the different network types cluster in different regions in the threedimensional parameter space. This observation is central to the network class inference of di2020network.
For a given , a given triplet, and initial conditions if and otherwise, where denotes the number of infected nodes at , we can numerically integrate Eq. 1 to obtain predictions . In our experiments, we assume that the recovery rate is known. Initial conditions are provided by the last observation available. Thus, the remaining task is the inference of
from the available observational data, here, the number of infected nodes at a set of discrete times. We use Bayesian inference to estimate the posterior distribution over the parameters
given observations. The required priors were derived from extensive Gillespie simulations on different networks and with different epidemic parameters. The particular challenge for making predictions lies in the limited observational period available for inference in a realistic prediction scenario, in which observations exist only up to the current state of the epidemic.2.2.1 Bayesian inference
The detail of the inference framework can be found in di2020network. Here, we only recall the main ideas and steps of the inference procedure. We denote the population level observations by where with denotes the number of infected individuals at times . For brevity, we use to denote . We further denote the set of candidate network classes as .
In order to make predictions, we require the posterior over given the observations, i.e., , which we can write as
In di2020network, the goal was network class inference, i.e., obtaining the posterior over given observations , . To this end, di2020network generated network class specific priors . Precisely, they carried out a large number of Gillespie simulations during which they kept track of the number of infected nodes , the number of SI links in the respective network states as well as the time spent in the various states. The parametric model from Eq. 2 was then fitted to the curves from the Gillespie simulations by a leastsquares fit using a particle swarm algorithm (kennedy1995particle). The resulting
triplets were used to infer Gaussian kernel density estimators
(pedregosa2011scikit) for the priors . Assuming a non informative, uniform prior for network class , the prior distribution over and is given byEmploying Bayes’ rule we obtain the network class specific posterior(s) over the parameter space as
(3) 
and the posterior over the network classes as
where denotes the likelihood of the observations under the forward model from Eq. 1 which is given by
(4) 
Following di2020network, the terms are computed using the algorithm from crawford2014estimation. The Python implementation routine estimating is available at https://github.com/BayIAnet/NetworkInferenceFromPopulationLevelData. To estimate , we draw samples from using the Metropolis–Hastings algorithm, making use of Eqs. 3 and 4.
2.2.2 Prediction and uncertainty
To obtain predictions one needs to integrate Eq. 1 with the parameters obtained from the posterior . We generate and evaluate two different types of predictions. The first variant incorporates information on prediction uncertainty as encoded in and . The second variant is based on a point estimate of . The Python routine for generating the predictions will be made available at https://github.com/tzerenner/EpidemicPredictionsFromPopulationLevelData.
For brevity, in the following, we neglect the dependence on and instead consider some fixed point in time. We denote by the pushforward measure of under the forward solution operator defined by Eq. 1. The density of is then related to that of through . The conditional mean of is given by
(5) 
When additionally integrating over all network classes , we can further obtain the conditional mean of , the pushforward measure of , as
(6) 
We estimate from samples which we draw from the posterior distributions , using the Metropolis–Hastings algorithm. Since integrating the Master equation with a large number of parameter combinations is computationally demanding, we thin the samples by including only every th draw, such that the autocorrelation between subsequent draws is . To choose an appropriate size of the (thinned) sample, we consider its multivariate effective sample size (mESS), which is estimated by
where denotes the sample covariance and
denotes the multivariate batch mean estimator of the covariance matrix in the Markov chain central limit theorem
(roy2020convergence; vats2020analyzing). The sample size is chosen to be the minimum such that which ensures a confidence level of and a tolerance level of for the expectation in the three parameter space (vats2019multivariate). To compute the mESS and the threshold for the desired confidence and tolerance levels, we used the Python implementation available at https://github.com/Gabrielp/multiESS.We then proceed to integrate the Master equation with each triplet to obtain and finally approximate the conditional mean by
(7) 
as well as the respective cumulative density over by
(8) 
From the latter we obtain equaltailed credible intervals for the predicted number of infected nodes. Equaltailed intervals are defined such that the probability of being below the interval is as high as being above the interval and thus can be directly obtained from the quantiles of the cumulative density as
(9) 
The interval for example corresponds to the 90% equaltailed interval. When obtained from (Eq. 6), such intervals incorporate both prediction uncertainty arising from uncertainty about parameters and network class as encoded in and , respectively, as well as prediction uncertainty arising from the intrinsic stochasticity of the epidemic spreading.
The second prediction variant is based on a point estimate, i.e., a single inferred from the posterior . We first identify the most likely network class, i.e, the mode of ,
(10) 
where MAP stands for maximum aposteriori, and then estimate the mode of ,
(11) 
using a combination of global and local optimisation routines (di2020network). The predictions are obtained by integrating the Master equation with , i.e.,
(12) 
Again, we compute the respective cumulative density over as
(13) 
from which we can obtain quantiles and equaltailed prediction intervals. Such intervals are not credible intervals in the Bayesian sense, but solely represent the intrinsic stochasticity of the epidemic spreading. They are thus systematically narrower than the credible intervals discussed above. An illustration of the two prediction variants for one single point in time is provided in Fig. 2.
To compare uncertainty in the space for the two prediction variants we further consider the covariance of the pushforward , . We denote the mean of the pushforward of under the forward solution operator by (Eq. 6). Its covariance is given by the outer product
We estimate from the discrete samples (Eq. 7) as
We can then evaluate
(14) 
where denotes the Euclidean norm in . To accordingly evaluate the uncertainty of the predictions based on the point estimator from Eq. 11, we further evaluate
(15) 
where are the predictions from Eq. 12.
2.3 Performance assessment
We generate a reference for evaluating the BD model by carrying out a set of Gillespie simulations which provides us with an i.i.d. sample of the stochastic spreading of an SIS epidemic on a given network. We chose a sample size of 1000 and denote the set of 1000 epidemic trajectories obtained from the sample as . To empirically validate the BD model, we compare the obtained from the numerical integration of Eq. 1 against the reference. We evaluate the difference in expectation, i.e.,
(16) 
where denotes the mean over the reference at time and denotes the mean number of infected nodes at time predicted by the BD model. We further evaluate the cumulative densities from the BD model in comparison to our reference using the integrated quadratic distance (IQD) which is given by
(17) 
Here denotes the cumulative density at time predicted by the BD model,
and denotes the reference empirical cumulative density,
where is an indicator function equal to if condition is true and 0 otherwise. A small IQD is obtained when not only the mean but also the intrinsic stochasticity of the epidemic spreading is adequately represented by the BD model. The goodness of fit between the cumulative densities is further illustrated by quantilequantile plots. The quantile function is the inverse of the cumulative density, i.e., , and hence given by
(18) 
for the BD model (s), and the reference (r), respectively.
3 Data
We consider nine different network and epidemic parameter combinations. The network parameters, network class and mean degree , and the epidemic parameters, per link infection rate and recovery rate , are summarised in Table 1. Gillespie simulations were performed on a network of nodes and initialised with five infected nodes selected at random. The time is an approximate value of the time span between initialising the simulation and reaching quasisteady state and in the following serves as a universal time scale which allows to plot the different cases onto the same time axis. Time is unitfree here. The simulated data can be rescaled to physicallymeaningful time scales by applying an appropriate multiplicative factor to the simulation time . The parameters for the nine cases were chosen such that we obtained one large epidemic with of the population infected at quasisteady state, one medium epidemic with to of the population infected at quasisteady state, and one small epidemic with of the population infected at quasisteady state for each network class.
In this study, we consider the epidemics at populationlevel, that is, we aim to predict the future course of the number of infected nodes. Network class and parameters of the BD model are inferred from populationlevel observations of the number of infected nodes at a set of discrete points in time.
case  

Reg l  5  4.251  2.969  0.75 
Reg m  10  1.265  5.773  1.5 
Reg s  7  0.762  3.356  8 
ER l  8.124  1.251  0.969  1.25 
ER m  15.868  0.859  6.338  1.25 
ER s  12.042  1.143  9.579  2 
BA l  13.902  3.123  6.969  0.25 
BA m  9.95  2.19  8.948  0.5 
BA s  7.968  0.612  3.803  2.5 
4 Results
4.1 Validation of the BD model
We carry out a set Gillespie simulations during which we keep track of the number of infected nodes , the number of SI links over time and the time spent in the observed states. For each case, we carry out in total 200 simulations half of which are initialised with five infected nodes and half with 1000 infected nodes. With this choice of initial conditions, we obtain realisations of the random variable ( #SI links) for each , from which we compute the expectations following di2020network as
(19) 
where denotes the total lifetime of all network states with infected nodes and SI links.
case  RMSE  

Reg l  0.469  0.475  0.915  85.10 
Reg m  0.110  0.182  1.007  6.90 
Reg s  0.039  0.224  1.019  6.41 
ER l  0.116  0.085  0.977  36.48 
ER m  0.162  0.020  0.984  18.07 
ER s  0.169  0.016  0.983  20.01 
BA l  4.872  0.726  0.799  208.09 
BA m  3.229  0.551  0.778  241.24 
BA s  0.765  0.494  0.776  56.07 
We then proceed to fit the parameters of the parametric model from Eq. 2 to the curves by a leastsquares fit using a particle swarm algorithm. Table 2 lists the resulting triples for each case along with the root mean square error (RMSE) between parametric curves and .
Figure 1 shows the empirical curves as well as the fitted curves for the nine cases. The top panels illustrate the good agreement between the parametric model and . The bottom panels of Fig. 1 show relative errors. It is not surprising that the relative error is largest for close to zero or close to , i.e., when the number of SI links is small either because only very few nodes are infected or because almost the entire population is infected. The relative errors are lowest for the ER network class followed by the Regular networks. For the BA network class, we obtain larger errors.
We simulate each case by integrating the Master equation (Eq. 1) with the empirical as well as the parametric . We initialise simulations with and 80 infected nodes at time , the latter two values corresponding to two and four cycles of doubling from the initially five infected nodes. A set of 1000 Gillespie simulations of each case serves as a reference. Figure 3 shows the difference in the expected number of infected nodes between BD model and reference. Figure 4 shows the integrated quadratic distance (IQD) between the cumulative densities from BD model and reference. The errors remain small throughout the simulations with the empirical (top panels), which confirms the suitability of the BD model to describe populationlevel infection rates in SIS epidemics on Reg, ER and BA networks. Not only is the expectation well captured by simulations with , but also the intrinsic stochasticity of the epidemic despite the meanfield approximation. We observe the largest errors for the BA network class and during the growth phase when simulations are initialised with (Fig. 4
a). The BA network class exhibits a higher degree of heterogeneity than Regular and ER networks. Hence, a larger variance in the number of SI links at a given
is expected. Therefore, the meanfield approximation might be less well suited for that type of network than for Regular and ER networks.The simulations with the BD model with the parametric (bottom panels) exhibit larger errors compared to the simulations with . Ranking the different cases studied, the BD model achieves the lowest errors for the ER network class, followed by Regular networks and the BA networks. Again, errors are largest when the simulations are initialised with which appears to be caused by the larger relative errors of the parametric model for small . The overestimation of at small by the parametric model for the BA network class (Fig. 1c) causes the number of infected nodes to increase too fast in the BD simulations, which leads to an overestimation of the number of infected nodes during the growth phase (Figs. 3d, 5h). Conversely, the underestimation of for small for the Regular networks (Fig. 1b) causes the number of infected nodes to increase too slowly in the BD model simulations, and hence an underestimation of the number of infected nodes during the growth phase (Figs. 3d, 5a). The errors peak during the growth phase and then decay until reaching an approximately constant value in the quasisteady state.
For the majority of the cases, the quasisteady state is well captured in both mean and variation around the mean, with the only exception being the small epidemics on Regular and BA networks. When the BD model is initialised at , it starts from with a state from which some of the small epidemics will eventually die out and only some will eventually converge to the quasisteady state. Due to the overestimation of for small for the BA networks in the parametric model, the probability of an epidemic to proceed to the quasisteady state from is overestimated in the BD model. Hence, the expected number of infected nodes in the BD model is too large. For Regular networks, the opposite holds and the expected number of infected nodes is too small. When initialised with the errors are smaller, but the temporal pattern of the errors persists, i.e., errors peak during the growth phase and then decay until the quasisteady state is reached (Figs. 3e, 4e). When initialised with , we find that both the growth phase as well as the quasisteady state is well captured by the BD model (Figs. 3f, 4f, 5c,g,j).
4.2 Predictions
4.2.1 Network class inference
di2020network demonstrated that one can reliably recover the class of the underlying network from populationlevel observations, when observations of the full epidemic trajectory from an early stage up to quasisteady state are available. When aiming to predict the future evolution of an ongoing epidemic, the inference of network class and parameters can only utilise observations of the epidemic up to its current state. Therefore, the question arises as to when one has sufficient information during an epidemic to reliably predict its further course. We therefore carry out a sensitivity analysis by inferring the posterior distribution from observation data sets covering different time windows during the evolution of the epidemic and incorporating different numbers of observations. For this analysis, we consider the medium epidemics (Table 1).
The results are summarised in Fig. 6
. As expected, in general, the longer the observational period, the higher the (average) posterior probability of the true underlying network class is. When the observational period ranges from an early stage of the epidemic up to quasisteady state (50 to quasisteady state), ten out of ten realisations on the BA network and seven out of ten realisations on Regular and ER networks are classified correctly. While BA networks can be clearly separated when sufficient data is available, distinguishing between Regular and ER networks appears challenging. For some realisations, the respective trajectories largely overlap (Fig.
7a).When the observation time span is shortened, the rate of correct classifications decreases (Fig. 6). As demonstrated in Fig. 7b, epidemics spreading on networks from the different classes may exhibit a similar shape during the earlier stage and only diverge later on. Thus, inferring the underlying network classes from populationlevel observations of a single realisation requires a sufficient observational time span. Increasing the number of observations from 10 to 100 does not have any visible effect on the classification. Ten observations provide a sufficient description of the epidemic trajectory.
We note that for the BA networks, classification accuracy increases when the very early stage of the epidemic up to is excluded from the observational data set. In Fig. 6e,f, this is most obvious when comparing the posterior probabilities obtained for the observation intervals , and with those obtained for , and , respectively. We believe this to be caused by the relatively large error of the parametric model for small for the BA network class (Fig. 1c). For small , the average number of SI links and hence are overestimated by the model. Hence, the initial spreading of the epidemic on the network is expected to proceed significantly slower than the BD model with optimal triplet would suggest. Further, we note that because Regular and ER networks are comparably close in the space, confusion between Regular and ER networks is comparably likely, but predictions are also expected to be comparably robust to confusion between Regular and ER network classes.
4.2.2 Epidemic trajectories
Figure 8 shows the predictions incorporating the uncertainty encoded in the posterior distribution(s). Shown are three realisations of the medium epidemics on Regular, ER and BA networks. Additional Figures for all ten realisations of the nine cases from Table 1 are provided in the Supplementary Material. The dots indicate the observations . The greyshaded areas indicate the predictions. Trajectories of 100 realisations of Gillespie simulations initialised at the network state associated with the last observational data point serve as reference.
For the majority of cases and realisations, the 90%credible interval (CI) contains the reference. For some cases/realisations, the reference lies just outside 90%CI (e.g., Fig. 8a realisation 9). For predictions of the medium epidemics initialised after five cycles of doubling (), we find the reference to lie just outside the 90%CI for one out of ten realisations for the ER network, while for the Reg and BA networks, it lies partly outside of the 90%CI for 3 out of 10 realisations each. The 90%CI spans a range of up to and tends to be larger for the small and medium epidemics than for the large epidemics.
When parameters and network class are inferred from observations up to five cycles of doubling (), prediction uncertainty is dominated by uncertainty about model parameters and network class. When inference is based on observations up to and including the quasisteady state, uncertainty on parameters and network class is negligible and the uncertainty on the future course of the epidemic is dominated by the intrinsic stochasticity of the process (dotted brown lines in Fig. 8). Hence, the magnitude of the prediction uncertainty is sensitive to the observational time span available for inference. In any realistic setting, observations are of course not available beyond the point from which one aims to predict the future course of an epidemic. As illustrated in Fig. 8b,d and f, parameter and network class uncertainty is, as expected, reduced when a longer observational time span is available. For predictions initialised at 6 cycles of doubling (), prediction uncertainty due to intrinsic stochasticity and parameter/network class uncertainty is similar in magnitude and it depends on the particular case and realisation if prediction uncertainty is dominated by either one.
For the predictions of epidemics on BA networks, the reference tends to lie in the lower half of the 90%CI whereas for Regular networks, it tends to lie in the upper half of the 90%CI. Relatively large credible intervals are associated with relatively large uncertainty about the network class. As illustrated in Fig. 7b, epidemics on different networks with a similar trajectory during the early stage eventually diverge, with the epidemics on the BA networks converging to the lowest level of infection during quasisteady state followed by ER and Regular networks. Thus, credible intervals obtained from observations of the beginning of one of the BA trajectories from Fig. 7b are expected to contain the true BA trajectories at their lower end. This behaviour can also be understood from the corresponding curves. Curves for networks from different classes that are similar for small will diverge for larger , with the curve corresponding to the BA network having the smallest peak, typically followed by ER and finally the Regular network with the highest peak.
Figure 9 shows the point estimatebased predictions. The prediction intervals here do not account for network class and parameter uncertainty, but only represent the intrinsic stochasticity of the epidemic spreading. Accordingly, the prediction intervals are systematically narrower than the credible intervals. The width of the prediction intervals is consistent with the spread of the trajectories from the Gillespie simulations. For some cases and realisations, the point estimatebased predictions provide a near perfect fit to the reference (e.g., Fig. 9a realisation 1, e realisation 1). However for some cases/realisations prediction and reference differ by up to (e.g., Fig. 9c realisation 6). For epidemics on BA networks, the number of infected nodes is overestimated in the point estimatebased predictions if the network is falsely identified as ER or Regular network. For epidemics on Regular networks, the number of infected nodes is underestimated if the network is falsely identified as ER or BA. The reason for this is the same as for the tendencies of the reference to occur in different parts of the credible intervals for the different network classes discussed in the above paragraph. When inference is based on observations up to six cycles of doubling (), the errors of the point estimatebased prediction are visibly reduced (see also the Supplementary Material). Hence, when longer observation time spans are available also point estimatebased predictions are potentially useful.
Finally, in Fig. 10 we consider the uncertainty in the space as described by the covariance of the pushforward measure around the two different predictions and . Shown is the medium epidemic on the ER network. As the predictions based on the conditional mean incorporate the uncertainty about the predicted that stems from uncertainty about network class and model parameters it is systematically wider (Fig. 2). This width reflects the width of the pushforward and thus leads to lower values of compared to the pointestimate based predictions. Further, the predictions incorporating uncertainty exhibit less variation of among the different realisations than the point estimatebased predictions. As already discussed alongside Figs. 8 and 9, we find the uncertainty to be systematically lower when predictions are based on longer observational periods. The longer the available observation period, the narrower the posterior and the smaller the difference between the two types of predictions and their respective uncertainty in the space.
5 Discussion
We have explored a modelling and inference framework for forecasting SIS epidemics spreading on networks. The surrogate model is based on a BD process. The effect of the contact structure has been condensed into a birthrate parameter, which is proportional to the average number of SIlinks for a given number of infected nodes. Our empirical validation has confirmed that the BD model is well suited to describe the evolution of an SIS epidemic on a network (Figs. 3 df, 4 df). Both the expectation and the intrinsic stochasticity of the epidemic trajectories are well reproduced even though our model formulation contains a meanfield approximation. The parametric model for the number of SIlinks, which has been introduced to enable the inference of network class and model parameters, is suitable for the range (Fig. 1). Hence, simulations with the BD model with the parametric should not be initialised with fewer than infected nodes (Fig. 3 ac, 4 ac).
Network class and epidemic parameters can be reliably inferred when observations are available from an early stage of the epidemic up to the quasisteady state. However, in realistic prediction scenarios, observations are only available up to the current state of the epidemic. The accuracy of the network class inference is sensitive to the observational time span (Fig. 6). Uncertainty increases as observational time span is reduced. This is because epidemics, though spreading on networks from distinct classes, can exhibit very similar trajectories through their earlier stages and only diverge when approaching the quasisteady state (Fig. 7). As discussed in allen2021predicting for instance, the uncertainty of the future course of an emerging epidemic during its early stages is dominated by the intrinsic stochasticity of disease transmission. It is thus no surprise that observations from an early stage of the epidemic appear not to contain sufficient information about the network class.
In predictions based on observations up to and initialised at , the prediction uncertainty is dominated by the uncertainty of the parameters/network classes. In predictions based on observations up to and initialised at , the prediction uncertainty stems in about equal proportions from parameter/network class uncertainty and intrinsic stochasticity of the epidemic spreading (Fig. 8). Thus, and especially for shorter observational periods and hence predictions initialised early during the epidemic, considering parameter uncertainty is crucial for providing meaningful information about prediction uncertainty (see also castro2020turning; wilke2020predicting). The results suggest that for most cases the credible intervals obtained provide reliable uncertainty information for the epidemic forecasts (see also the Supplementary Material). If longer observational time spans are available, point estimatebased predictions are potentially useful as well (Fig. 9).
Our study differs from other approaches in network inference in so far as our aim here is not to infer the existence, or otherwise, of links but rather to infer the most likely network class that led to the observed populationlevel data resulting from an epidemic spreading on it. As a result, the data needed for inference does not contain node or linklevel information. There are both advantages and disadvantages to such an approach. On the one hand, the computation of the likelihood in our case is more straightforward and the data needed for inference is modest. On the other hand, if more detailed data is available, the proposed model will not be able to capture it nor benefit from it. However, more complex models will need large quantities of detailed data (i.e., in the case of cascades, the data needs to contain cascades starting from, or involving, as many nodes as possible, gomez2012inferring) to produce acceptable results with large computational burden. The choice of model and inference will depend on the context.
There are many directions in which the current model and inference scheme can be developed. First, we only explored three network classes where the key difference was degree heterogeneity. However, networks displaying degreedegree correlations, clustering, spatial structure or some type of mesoscale structure, such as communities, may be of interest as they are more representative of realworld scenarios. Equally, from a theoretical viewpoint, lattices could be considered. This is a nontrivial task and depending on which network property or combination of properties we choose to model, it may turn out that the birthrates of the BD process will no longer be parabolalike and the proposed parametric model may no longer provide a satisfactory fit. However, we expect that more complex models will be able to capture the birthrates in the BD process resulting from such more exotic networks. Another natural extension would be to consider more complex epidemic models, such as SIR, where the corresponding BD model will now have equations and the birthrates of the BD process will define a surface rather than a curve. However, and perhaps more interestingly, the excellent agreement between the exact and surrogate model, leads us to believe that a rigorous proof that quantifies the error between the exact and BD models may be possible. For example, it is clear that as a Regular network becomes more densely connected, and in the limit of number of links going to , the BD model becomes exact.
Acknowledgements
All authors acknowledge support from the Leverhulme Trust through the Research Project Grant RPG2017370.