Asymptotic-Preserving Neural Networks for multiscale hyperbolic models of epidemic spread

06/25/2022
by   Giulia Bertaglia, et al.
5

When investigating epidemic dynamics through differential models, the parameters needed to understand the phenomenon and to simulate forecast scenarios require a delicate calibration phase, often made even more challenging by the scarcity and uncertainty of the observed data reported by official sources. In this context, Physics-Informed Neural Networks (PINNs), by embedding the knowledge of the differential model that governs the physical phenomenon in the learning process, can effectively address the inverse and forward problem of data-driven learning and solving the corresponding epidemic problem. In many circumstances, however, the spatial propagation of an infectious disease is characterized by movements of individuals at different scales governed by multiscale PDEs. This reflects the heterogeneity of a region or territory in relation to the dynamics within cities and in neighboring zones. In presence of multiple scales, a direct application of PINNs generally leads to poor results due to the multiscale nature of the differential model in the loss function of the neural network. To allow the neural network to operate uniformly with respect to the small scales, it is desirable that the neural network satisfies an Asymptotic-Preservation (AP) property in the learning process. To this end, we consider a new class of AP Neural Networks (APNNs) for multiscale hyperbolic transport models of epidemic spread that, thanks to an appropriate AP formulation of the loss function, is capable to work uniformly at the different scales of the system. A series of numerical tests for different epidemic scenarios confirms the validity of the proposed approach, highlighting the importance of the AP property in the neural network when dealing with multiscale problems especially in presence of sparse and partially observed systems.

READ FULL TEXT VIEW PDF

page 14

page 15

page 17

page 18

page 20

page 22

page 26

page 27

03/06/2015

Estimation of the parameters of an infectious disease model using neural networks

In this paper, we propose a realistic mathematical model taking into acc...
02/21/2022

EINNs: Epidemiologically-Informed Neural Networks

We introduce a new class of physics-informed neural networks-EINN-crafte...
06/14/2021

Spatial spread of COVID-19 outbreak in Italy using multiscale kinetic transport equations with uncertainty

In this paper we introduce a space-dependent multiscale model to describ...
07/08/2020

Hyperbolic models for the spread of epidemics on networks: kinetic description and numerical methods

We consider the development of hyperbolic transport models for the propa...
04/24/2022

The Multiscale Structure of Neural Network Loss Functions: The Effect on Optimization and Origin

Local quadratic approximation has been extensively used to study the opt...
06/30/2021

Switchover phenomenon induced by epidemic seeding on geometric networks

It is a fundamental question in disease modelling how the initial seedin...

1 Introduction

In recent decades, enormous progress has been made in the understanding of complex systems described by multiscale PDEs with applications ranging from classical physics and engineering to biology and social sciences [1, 36, 4, 5, 11, 25, 14].

Despite continuing progress, modeling and predicting the evolution of nonlinear multiscale systems using classical analytical or computational tools inevitably faces severe challenges. Firstly, numerically solving a multiscale problem requires complex and sophisticated computational codes and can introduce prohibitive costs (due to the well-known curse of dimensionality). Moreover, we are always facing the difficulties related to the scarcity of data and multiple sources of uncertainty, especially when concerning social sciences. Above all, solving real physical problems with missing or incomplete initial or boundary conditions through traditional approaches is currently impractical. This is where and why data-driven models began to play a crucial role [17].

Machine Learning (ML) is an incredibly powerful tool, which has proven to have an enormous impact in many fields of our society. This has led to great interest in using ML techniques also to study challenging scientific problems in science, engineering, medicine, concerning complex multiscale dynamics. However, it is clear that the problems we are dealing with are very different from the classical problems in which ML has proved to be so successful. So, we cannot simply take the available ML methods as a “black box” and use them uncritically [2]. Purely data-driven models may fit observations very well, but predictions may result physically inconsistent and, consequently, lead to erroneous generalizations. Therefore, there is an urgent need to integrate fundamental physical laws and related mathematical models into the learning process of the neural networks [28, 18, 33]. The main motivation for developing this new class of physics-informed machine learning algorithms is that such prior physical knowledge or constraints can ensure that ML methods remain robust even in the presence of imperfect data (such as missing, incomplete or noisy data) and provide accurate predictions that adhere to the physics of the phenomenon under study.

A recent example of this new learning paradigm is represented by Physics-Informed Neural Networks (PINNs) [15, 38, 28, 44]

. PINNs are a new class of deep neural networks (DNNs) that are trained to solve supervised learning tasks while respecting any given physical laws described through general nonlinear ordinary differential equations (ODEs) or partial differential equations (PDEs). The physical knowledge of the underlying phenomenon is incorporated into the PINN mainly in two ways: either it is introduced directly through the data embodying the underlying physics of the phenomenon of interest (observational bias) or it is introduced by an appropriate choice of the loss function that the PINN must minimize, forcing the training phase of the neural network to converge to solutions that adhere to the underlying physics (learning bias).

Nevertheless, the adoption of a standard formulation of PINNs in the context of multiscale problems may still lead to incorrect inferences and predictions [26]. This is mainly due to the presence of small scales leading to reduced or simplified models in the system that need to be enforced consistently during the learning process. In these cases, a standard PINN formulation allows an accurate description of the process only at the leading order, thus loosing accuracy in the asymptotic limit regimes. One remedy for this, as recently proposed in [26]

, is to modify the loss function to include asymptotic-preserving (AP) properties during the training process. The realization of such an AP-loss function will therefore depend on the particular problem under study and will be based on an appropriate asymptotic analysis of the model.

One particularly interesting area where the use of machine learning techniques can play a key role concerns epidemiological dynamics. In this context, a number of mathematical models have recently been proposed that require the estimation of several parameters from data to provide predictive scenarios and to test their reliability 

[1, 10, 7, 11, 12, 19, 20, 37, 41]. In this paper we will focus on a new class of epidemic models described by multiscale PDEs capable of describing both hyperbolic-type phenomena characteristic of epidemic propagation over long distances and main lines of communication between cities and parabolic-type phenomena in which classical diffusion prevails at the urban level [1, 10, 7, 11]. The multiscale nature of the problem poses a challenge to the construction of PINN, and preservation of the AP property is therefore essential in order to obtain reliable results. Following the approach recently introduced in [26], we will show how to construct AP neural networks (APNNs) that are capable to solve both inverse and forward problems of interest in epidemic dynamics.

The rest of the paper is organized as follows. The next section is devoted to the description of the model under study and a formal analysis of the different multiscale behaviors. In Section 3 we introduce the notion of APNN and describe how to construct such a neural network in the case of a simplified multiscale hyperbolic model and then how to extend it to the epidemic case under study. A series of numerical results for both inverse and forward problems using synthetic data produced by the numerical solution of the mathematical model illustrate the validity of the present approach. In particular, the case of partially observed systems, as commonly found in epidemics, will be considered and permits to emphasize the relevance of the AP-property. Some final considerations and future developments are reported in a concluding section.

2 Hyperbolic models of epidemic spread

For simplicity, we illustrate the space dependent epidemiological modeling in the case of a classic SIR compartmental dynamics, in which we consider a population subdivided in susceptible (individuals who may be infected by the disease), infectious (individuals who may transmit the disease) and removed (individuals healed and immune or died due to the disease). We assume to have a population with subjects having no prior immunity and neglect the vital dynamics represented by births and deaths due to the time scale considered. Nevertheless, it is straightforward to extend our arguments to more enriched compartmentalizations, designed to take into account specific features of the infectious disease of interest, as those proposed recently in [10, 7, 11, 12, 19, 20, 37, 41] to study the spread of COVID-19.

2.1 The hyperbolic SIR model

By analogy with discrete-velocity kinetic theory [36, 4], we consider individuals moving in a one-dimensional domain in two opposite directions, with velocities , distinguished for each epidemic compartment. Notice that the characteristic velocities reflect the heterogeneity of geographical areas, and, therefore, are chosen dependent on the spatial location . Hence, we can describe the space-time dynamics of the population for through the following two-velocity SIR epidemic transport model [1, 9, 8]:

(1)

with the total densities of each compartment, , , and , given by

The transport dynamics of the population is governed by the scaling parameters as well as the relaxation times . The quantity is the recovery rate of infected, which corresponds to the inverse of the infectious period. This rate may vary in space and time depending on the treatment therapies used, even though generally can be assumed constant, especially for short-term analysis. The transmission of the infection is defined by an incidence function modeling the transmission of the disease [23, 13, 30]. The transmission rate

characterizes the average number of contacts per person per time, multiplied by the probability of disease transmission in a contact between a susceptible and an infectious subject. Notice that this rate may vary in space and time as a consequence of the intensification of governmental control actions (such as mandatory wearing of masks, closing of specific activities or full lockdowns) or their relaxation (lifting mask mandates, reopening schools, restaurants, leisure and cultural centers) in specific locations.

It is worth to highlight that, when investigating real epidemic scenarios, the above-mentioned parameters are, in general, unknown. While the recovery rate might be fixed based on clinical data, the transmission rate must always be estimated through a delicate calibration process in order to match available data. It is also well-known that this process is highly heterogeneous which makes the inverse problem even more challenging [16].

The standard threshold of epidemic models is the so-called basic reproduction number , which defines the average number of secondary infections produced by one infected individual in a totally susceptible population [23]. The effective reproduction number , instead, defines the variation in time of this rate, giving information on the progress of the infectious spread. Indeed, this number determines when an infection can invade and persist in a new host population (), or tend to fade away (). The endemic state corresponds to the case .

Assuming no inflow/outflow boundary conditions in , integrating in space and summing up the second equation in (1) we are able to define the effective reproduction number of the SIR transport model

(2)

Notice that this definition naturally extends locally by integrating over any subset of the computational domain if one ignores the boundary flows. Under the same no inflow/outflow boundary conditions, if we integrate in equations (1), we can finally observe that the model fulfill the conservation of the total population, being

(3)

with , and total population reference size, constant over time.

2.2 Multiscale behavior and diffusion limit

Introducing the fluxes, defined by

(4)

we obtain a hyperbolic model equivalent to (1), but presenting a macroscopic description of the propagation of the epidemic at finite speeds

(5)

Let us now consider the behavior of this model in diffusive regimes [32]. To this aim, we introduce the space dependent diffusion coefficients

(6)

which characterize the diffusive transport mechanism of susceptible, infectious and removed, respectively. Keeping the above quantities fixed while letting the relaxation times (and so the characteristic velocities ), from the last three equations in (5) we obtain, for each epidemic compartment, a proportionality relation between the flux and the spatial derivative of the corresponding density (Fick’s law)

(7)

Substituting (7) into the first three equations in (5), we recover the following parabolic reaction-diffusion model, widely used in literature to study the spread of infectious diseases [35, 40, 43, 24, 6]

(8)

The model’s capability to account for different regimes, ranging from hyperbolic to parabolic, according to the space dependent values and , makes it suitable for describing the dynamics of human beings. Our daily routine is, indeed, a complex mixing of individuals moving at the scale of a city center and individuals traveling among different municipalities. In this situation, it results more appropriate to describe the human dynamics in city centers with a high density of individuals through a diffusion operator, while characterizing the mobility of subjects in extra-urban areas through a hyperbolic, advective, mechanism, avoiding in this case a propagation of the information at infinite speeds [10, 7, 11].

3 Asymptotic-Preserving Neural Networks (APNNs)

In this section, we provide a brief overview of the general framework of PINNs [28, 38] and then we shall discuss the relevant concepts of Asymptotic-Preserving Neural Networks (APNNs) for the problems of interest.

3.1 Basics of PINNs

The design of a standard deep neural network (DNN) by supervised learning can be summarized in three main steps [18]:

  1. The choice of the neural network structure.

  2. The loss function that minimizes the classical empirical risk, typically characterized by the difference between model and data.

  3. A method to minimize loss over the parameter space. The most popular choices are stochastic gradient descent (SGD) and advanced optimizers such as Adam 

    [31].

In practice, the performance of the neural network is estimated on a finite data set (which is unrelated to any data used to train the model) and called test error, whereas the error in the loss function (which is used for training purposes) is called the training error.

Compared to the above classical deep learning methodology, the major difference of PINN is the integration of physical laws, usually in the form of PDEs

(9)

Here is the spatio-temporal domain of the system, represents its boundary, is the differential operator, represents the solution to the system, is the parameter related to the physics. Since the initial condition is mathematically equivalent to the boundary condition in the spatio-temporal domain, we use as a general operator for arbitrary initial and boundary conditions of the system.

PINN models usually include a neural network representation of the solution , parameterized by network parameters and having

input data. In PINN literature, the most widely used neural network architecture is the feed-forward neural network (FNN). A

layered FNN consists of an input layer, an output layer, and hidden layers, which can be defined as follows

where are the weights, the bias, is the width of the -th hidden layer with the input dimension and the output dimension,

is a scalar activation function (such as ReLU

[21]), and “” denotes entry-wise operation. Thus, we denote the set of network parameters .

To find the optimal values for , the neural network is trained by minimizing the following type of loss (also called cost or risk) function

(10)

Here and quantify the discrepancy of the neural network surrogate with the underlying PDE and its initial or boundary conditions in (9), respectively. The data mismatch loss is applied when additional measurement data are available, e.g., when solving inverse problems, and , ,

are the corresponding weight vectors. The most popular methods chosen to solve this optimization problem remain stochastic gradient descent (SGD) and Adam 

[31]. After finding the optimal set of parameter values by minimizing the PINN loss (10), i.e.,

(11)

the neural network surrogate can be evaluated at any given spatio-temporal point to get the solution.

In the context of inverse problems, the structure of the network is almost the same with respect to the forward problem setting, except that unknown physical parameters are treated as learnable parameters. As a result, the training process involves optimizing and jointly

(12)

In summary, PINN can be regarded as an unsupervised learning approach when used to train forward problems, with only equations residual and boundary conditions in the loss function, and as a semi-supervised learning approach for inverse problems, when some measurements are available. In the last part of this section, we shall further discuss in detail each component of this learning framework through several examples.

3.2 Extension to APNNs

Figure 1: AP diagram for neural networks. is the multiscale problem that depends on the scaling parameter , while is the corresponding formulation in the reduced order limit, which does not depend anymore on . The solution of the system is approximated by the neural network through the imposition of the residual term . The asymptotic limit of as is denoted with . The neural network is called AP if is consistent with the residual of the reduced system .

Since we aim at analyzing multiscale hyperbolic dynamics regardless of the propagation scaling, in order to obtain physically-based predictions, it is important that the PINN can preserve the correct equilibrium solution (8) in the diffusive regime, which means that the PINN should fulfill the AP property [25, 26, 34, 22]. We remark that in the context of the epidemic modeling of this work, the AP property is of particular importance, allowing the same neural network to efficiently and robustly simulate population dynamics characterized by both diffusive and hyperbolic transport behaviors (the former in urban centers and the latter for mobility along connecting routes).

The neural networks satisfying this property are called Asymptotic-Preserving Neural Networks (APNNs), and have been recently introduced in [25, 26] to efficiently solve multiscale kinetic problems with scaling parameters that can have several orders of magnitude of difference. The definition of an APNN reported in [26] for the case of multiscale kinetic models with continuous velocity fields is generalized in the following (see Figure 1).

Definition 1 (Asymptotic-Preserving Neural Network).

Assume the solution is parameterized by a PINN trained by using an optimization method to minimize a loss function which includes a residual term enforcing the physics of the phenomenon. Then we say it is an Asymptotic-Preserving Neural Network (APNN) if, as the physical scaling parameter of the multiscale model tends to zero, the loss function of the full model-constraint converges to the loss function of the corresponding reduced order model.

In other words, the loss function, viewed as a numerical approximation of the original equation, benefits from the AP property.

3.3 A simple example: APNN for the Goldstein–Taylor model

To illustrate the relevance of the AP property in the construction of the neural network, let us carry on a detailed example by considering a simplified case in which there are no epidemic source terms that allow individuals to move to a different compartment and the entire population behaves as a single compartment. Such a case corresponds to the so-called Goldstein–Taylor model in discrete velocity kinetic theory [36, 27]. This model, indeed, describes the space-time evolution of the two particles densities and , at time , traveling in a one-dimensional domain, , with velocity , respectively. At the same time, particles can change and assume the opposite velocity, randomly. The dynamics of this system of particles is governed by the following system of PDEs

(13)

with scaling parameter of the kinetic dynamics and scattering coefficient. The total particles density is given by .

We consider to be a DNN with inputs and and trainable parameters , to approximate the solution of our system: . Then, we define the PDEs residual

(14)

and incorporate it into the loss function term of the neural network by taking the weighted mean square error of the residual to obtain a standard PINN.

To understand the asymptotic behavior of the model we resort on a suitable macroscopic formulation of the system which is achieved through the introduction of the scaled flux . This permits to write the system (13) in equivalent form as

(15)

In the diffusion limit, i.e. let , we obtain

(16)

which, inserted into the first equation, leads to the reduced diffusive model (which recalls the standard heat equation)

(17)

It is clear that the standard PINN residual (14) is not consistent with the above analysis since in the limit reduces to

which corresponds to force and does not suffice to achieve the correct diffusive behavior (17).

In contrast, using the macroscopic formulation (15), we can construct an APNN incorporating in the loss function the mean square error of the PDEs residuals

(18)

Now, in the limit , we obtain

(19)

which is consistent with the residual of the limiting diffusive model (17). We refer to Appendix A for a detailed description of the loss function for the Goldstein-Taylor model, including data and boundary conditions loss terms.

3.4 APNN for the hyperbolic SIR model

To achieve the AP property in the neural network for the hyperbolic SIR model, we follow the same approach of the previous section. Thus, we consider the system written in macroscopic form defined by equations (5). Multiplying both members of each equation for the corresponding scaling parameter , we can rewrite the system in the following compact form

(20)

where

We consider to be a deep neural network (NN) with inputs and and trainable parameters , to approximate the solution of our system: . Then, we define the residual term

(21)

and embed it into the loss function of the neural network to obtain an APNN. We omit for brevity the detailed analysis of the AP property. In the limit as , , , under conditions (6), such analysis follows the same steps of the previous section, and results in agreement with the diffusion limit computed in Section 2.2.

Figure 2: APNN schematic work-flow. The NN architecture is integrated with the physical knowledge of the dynamics of interest through the inclusion of the PDE system and the enforcement of initial and boundary conditions (and eventually conservation properties), when known, becoming a PINN. The AP property, which is a fundamental feature when dealing with multiscale hyperbolic systems, is guaranteed through the correct design of an AP-loss function.

We restrict the neural network approximation to satisfy the physics imposed by the residual (21) on a finite set of user-specified scattered points inside the domain, (referred as residual points) and we also enforce the initial and space-boundary conditions of the system on scattered points of the space-time boundary , i.e. [29]. In the context of inverse problems, we also consider to have access to measured data, with a dataset , with , available in a finite set of fixed training points. Thus, in the training process of the PINN, we minimize the following AP-loss function, composed of four mean squared error terms

(22)

where , , , characterize the weights associated to each contribution. Notice that quantifies the mismatch of the approximated solution with respect to known data samples, while , and represent the discrepancy in initial/boundary conditions of (20), in the residual (21) and with respect to the conservation of the total density in the domain (3), respectively, all three contributing to enforce the physical structure of the problem. We present the detailed expression of each term in (22) in Appendix B. A schematic representation of the APNN architecture is given in Figure 2.

4 Numerical examples and applications

In this section, various numerical tests are presented to assess the performance of the proposed APNNs. The first two examples concern the usage of an APNN for the solution of inverse and forward problem set up considering as prototype multiscale hyperbolic system either the standard Goldstein–Taylor model or a slightly modified version of it. Even if this model is a simpler system of equations with respect to (5), it well represents the dynamics of interest, as discussed in Section 3.3. These tests are designed to further highlight how the choice of the APNN formulation proposed in this work is fundamental for the treatment of multiscale problems, especially in the context of availability of partial information. We shall demonstrate also numerically with this prototype model (and we refer to Section 3.3 for the analytical proof) that a standard PINN formulation leads to the loss of the AP property and, consequently, to non-physical reconstructions of the sought dynamics.

Following that, various tests concerning the solution of epidemic problems are discussed, examining the APNN performance in inferring the unknown epidemic parameters, solving the forward problem, and forecasting the spread of the infectious disease, also when spatially heterogeneous parameters are considered.

The numerical solution obtained with a second-order AP-IMEX Runge-Kutta Finite Volume method [9, 10] is considered as synthetic data for the ground truth and used in the APNN to build up the training dataset. With regards of epidemic test cases, we remark here, as also discussed in Appendix B, that since data of fluxes are not accessible in real-world applications, we only enforce the measurements of in . Nevertheless, unless otherwise specified, we impose initial conditions of the fluxes in . In all the examples, periodic boundary conditions are considered. To strictly impose them (accounted again in ), we employ the periodic mapping technique taken from [45] in the input layer

(23)

where

is a hyperparameter controlling the frequency of the solution. For the tests concerning the Goldstein–Taylor model, the activation function

sin is chosen, adopting the SIREN framework [39]; for the epidemic tests, the function tanh is used. Finally, the Adam method [31] is used for the optimization process and derivatives in the NN are computed applying automatic differentiation [3].

For all the numerical examples, we adopt a single feed-forward neural network with depth and width . The model structure is deliberately fixed among numerical experiments in both parabolic and hyperbolic regime, to highlight the main advantage of AP schemes that macroscopic behavior can be captured without resolving small physical parameters numerically (i.e. the architectural parameters of the neural network are independent of the physical scaling parameters). The chosen model and training hyperparameters are given in Tables 7 and 8 of the Appendix C for each test case.

4.1 Test 1: Goldstein-Taylor model in diffusive regimes

In the following, we seek to emphasize numerically the importance of choosing the correct formulation to preserve the AP property and correctly approximate population dynamics even in diffusive regimes, particularly when dealing with partial information available. To this aim, we set up for problem (13) a test with initial conditions

with and . We consider periodic boundary conditions, choosing in the periodic mapping (23), and only the diffusive, parabolic regime of the model, choosing , with final time of the simulation .

Figure 3: Test 1: Inverse problem for the Goldstein-Taylor model in the diffusive regime (). Convergence of the target parameter

with respect to epochs using the APNN (left) and the standard PINN (right).

Figure 4: Test 1: Forward problem for the Goldstein-Taylor model with standard PINN in the diffusive regime (). Solution of the forward problem by PINN (left) and ground truth (right) of the kinetic densities (top) and (bottom).
Figure 5: Test 1: Forward problem for the Goldstein-Taylor model with APNN in the diffusive regime (). Solution of the forward problem by APNN (left) and ground truth (right) of the density (top) and (bottom).
Inverse Problem

Initially, we consider an inverse problem inferring the scattering coefficient from the available measurement data using the APNN formulation presented in Appendix A, with loss function (26) and term given in (28). For comparison, we also solve the inverse problem applying the standard PINN residual (14) in the loss function. For both APNN and standard PINN formulations, we train the network model on measurements composed of equally spaced samples in the domain , from which 20% (4800) points are randomly selected for validation purpose. For the APNN model we consider measurements only for the density , hence assuming to have no information on the flux , whereas for the standard formulation we employ data samples for both the densities and (therefore, in the latter case we assume we have more information on the system (13)). In addition, residual points are employed with the same data split for validation set. With respect to loss function and training hyperparameters of the APNN given in Table 7, the same setting has been used also for the standard PINN, with the only difference just stated that, when used, the training dataset is given for both variables considering equal weights .

We show the convergence of the target parameter in Figure 3 for both PINN formulations. A very fast convergence can be observed in the APNN, with the initial guess and a final relative error . However, it can be observed that the standard PINN failed to recover the correct value of the scattering parameter (at epoch 4000, early-stopping prevents further training of the PINN).

Forward Problem

To further highlight the importance of the AP property, we consider a forward problem for the Goldstein-Taylor model, where scattering coefficient is given and the goal now is to solve the equations on the spatio-temporal domain with corresponding initial conditions. For APNN formulation, points are employed to enforce initial conditions of both and , with equation enforced on residual points on the domain . The standard PINN formulation based on the kinetic equations (13) share the same set with APNN, but initial conditions are given for .

We plot the solutions obtained with the standard PINN in Figure 4 and with APNN in Figure 5. Standard PINN based on the kinetic equations (13) shows its weakness and converges to a trivial solution on the space-time domain, failing to approximate the forward solution of both density and flux. On the contrary, the adoption of the APNN ensures the convergence towards the correct diffusive limit, which is also beneficial for the inverse problem we considered before.

4.2 Test 2: Goldstein–Taylor model with source term

To examine the performance of the APNN with a more challenging setting closely related to epidemic scenarios that we shall discuss later on, we introduce a source term that creates an oscillatory effect in the density in the Goldstein–Taylor model. The resulting system reads

(24)

where . For this problem, we reformulate the AP-loss function accordingly to the model, simply including the presence of the source term with respect to the formulation discussed in Appendix A. In the source term, we set , with a baseline value perturbed by sinusoidal oscillations having amplitude and frequency . We consider again a spatial domain and . The final goal in this test is to infer parameters , and and evaluate the spatio-temporal reconstruction given by the APNN with a partially observed system, having only information of , considering a scattering coefficient and the following initial conditions:

Parameter Ground Truth Initial Guess Estimation Relative Error
0 0.5 0.0011 N/A
3 2 2.9263
4 3 4.0003
Table 1: Test 2 (a): Goldstein-Taylor model with source in diffusive regime () with density data only. Inference results for the source term coefficients using the APNN.
Figure 6: Test 2 (a): Goldstein-Taylor model with source in diffusive regime () with density data only. Approximated forward solution (left column), ground truth (middle column) and relative error (right column) of density (first row) and flux (second row) obtained with the APNN.

Test 2 (a): Diffusive regime with density data only

We initially consider a diffusive, parabolic regime defined by , with . We employ for , not considering any dataset for , while still imposing initial and boundary conditions for both variables. For the residual term, we use points on the domain . We use 20% of and for validation purposes and the rest for the training. Results of the parameters inference are shown in Table 1, where initial guesses of target variables are listed, even though we observed that the neural network is not very sensitive to the choice of these values. From these results we can observe that, in general, the most difficult coefficient to calibrate with the NN is the amplitude of the perturbation of the source term, .

The APNN forward approximations of and are presented in Figure 6, where we can observe that forward solutions well capture the correct dynamics of and accurately recover without any measurement on the latter. Nonetheless, we acknowledge that when concerning diffusive regimes as in Eq. (17), the problem results fully described by the sole density , and the absence of information on does not lead to an actual lack of data knowledge.

Parameter Ground Truth Initial Guess Estimation Relative Error
0 0.5 N/A
3 2 3.0005
4 3 4.0002
Table 2: Test 2 (b): Goldstein-Taylor model with source in hyperbolic regime () with density data only. Inference results for the source term coefficients using the APNN.
Figure 7: Test 2 (b): Goldstein-Taylor model with source in hyperbolic regime () with density data only. Approximated forward solution (left column), ground truth (middle column) and relative error (right column) of density (first row) and flux (second row) obtained with the APNN.

Test 2 (b): Hyperbolic regime with density data only

In the second case, we consider a hyperbolic regime with and . We employ for , not considering again any dataset for , and fix on the domain , with 20% of each dataset for validation. Coefficients inferred by the APNN are listed in Table 2, while forward solutions are shown in Figure 7. Similar to the diffusive regime, the APNN correctly infer all the unknown parameters and is capable of approximating the solution of densities and well, but in this case in a much more demanding problem. Indeed, even though in hyperbolic regimes the problem is not completely defined by the sole density of the system, being the dataset really incomplete without any information on the flux , the APNN is still capable of approximating the correct solution of the whole dynamics.

4.3 Test 3: SIR transport model with constant epidemic parameters

In the following, we evaluate the performance of the APNN with respect to the dynamics governed by the SIR multiscale transport model (5). We first design a numerical test with an initial condition that simulates the presence of two epidemic hot-spots, aligned in the spatial domain , presenting a different number of infected individuals, distributed following a Gaussian function,

where and are the coordinates of the hot-spots, while and define the different initial epidemic concentration in the two cities, hence with a deeply higher density of infected individuals in the first city. Assuming that there are no immune individuals at

and that the total population is uniformly distributed in the domain, we have

We impose initial fluxes in equilibrium, following (7), and periodic boundary conditions to allow both directions of connection for the two cities. We initially consider a simple setting defined by constant epidemic parameters in space and time, with and , which lead to study an infectious disease characterized by an initial reproduction number .

The APNN is used to infer both the epidemic parameters as well as approximate the solutions for a parabolic and a hyperbolic scenario. To mimic the availability of data close to reality, we use a sparse dataset for the training process, sampling the spatio-temporal points from the available dataset with probability proportional to the magnitude of . We consider, indeed, that in real-world epidemic scenarios data on the evolution of the infectious disease are only available in the regions in which the virus has already started to spread. Specifically, the probability of each spatio-temporal location chosen for the training dataset is given by

(25)
Figure 8: Test 3.1 (a): SIR transport model with constant epidemic parameters and partially observed dynamics in diffusive regime (, ). Selected sparse samples () marked with white crosses (left column), approximation obtained in the inverse problem (middle column), and ground truth (right column) of the densities of infected .
Figure 9: Test 3.2 (a): SIR transport model with constant epidemic parameters and partially observed dynamics in diffusive regime (, ). Approximation and forecast with measurements on a short time denoted by the dashed line (left column), and ground truth (middle column) of infected (first row) and removed (second row). Temporal evolution of the cumulative density of infected individuals in the whole domain (first row, right) and of the reproduction number (second row, right) obtained with the APNN, trained based on a short time period (marked by the dotted line).

Test 3 (a): Partially observed dynamics in diffusive regime

In the first case, a parabolic configuration of speeds and relaxation parameters is considered, with and . We examine the performance of the APNN in the two following different problems.

  • Test 3.1 (a): Parameter inference test. We consider a sparse dataset where only measurements are selected from the entire space-time domain , according to the density of , as described in (25), as shown in Figure 8 (left).

  • Test 3.2 (a): Forecasting test. As a second problem, we intend to investigate the forecasting capability of the APNN. In contrast with sampling measurements available across the entire spatio-temporal domain in the parameter inference test, we generate a training dataset of size on a shorter time domain and we assess the correctness of APNN approximations in and forecasting performance in .

In both cases, equations residual are enforced on residual points on the spatio-temporal domain and 20% of each dataset is used for validation. In addition, we assume initial conditions for are unknown in both problems, thus requiring an even more demanding performance to the APNN.

Parameter Ground Truth Initial Guess Estimation Relative Error
12 8 11.9428
6 3 5.9772
Table 3: Test 3.1 (a): SIR transport model with constant epidemic parameters and partially observed dynamics in diffusive regime (, ). Inferred results for transmission rate and recovery rate from a sparse measurement dataset of samples, and the relative error with respect to the ground truth values.

Results of the parameter inference task based on the sparse measurement dataset are reported in Table 3, where an excellent estimation of both and can be observed with respect to the ground truth. Figure 8 shows that the reconstructed forward approximations for the density of the epidemic compartment have an excellent agreement with the true solution in the entire domain. Also in the forecasting test, the approximated and predicted dynamics (based on the measurements from the time period ) perfectly match the ground truth in the entire domain , as shown in Figure 9, even if in this demanding setting initial conditions of densities are assumed to be unknown. These results further highlight the capability of APNN to forecast the spread of an infectious disease in diffusive regimes thanks to the physical knowledge of the PDE system embedded in the NN together with the preservation of the AP property. In the same Figure, we present also the temporal evolution of the cumulative density of infected individuals in the whole domain as well as the effective reproduction number predicted by the APNN. The excellent agreement between predictions () and the ground truth out of the training domain further assess the forecasting capability of APNNs.

Parameter Ground Truth Initial Guess Estimation Relative Error
12 8 12.0126
6 3 6.0447
Table 4: Test 3.1 (b): SIR transport model with constant epidemic parameters and partially observed dynamics in hyperbolic regime (, ). Inferred results for transmission rate and recovery rate from a sparse measurement dataset of samples, and relative error with respect to the ground truth values.
Figure 10: Test 3.1 (b): SIR transport model with constant epidemic parameters and partially observed dynamics in hyperbolic regime (, ). Selected sparse samples () marked with white crosses (left column), approximation obtained in the inverse problem (middle column), and ground truth (right column) of the densities of infected .
Figure 11: Test 3.2 (b): SIR transport model with constant epidemic parameters and partially observed dynamics in hyperbolic regime (, ). Approximation with measurements from a shorter time period (first column) or (middle column), stopped at the dashed line, and ground truth (last column), of the densities of infected (first row) and removed (second row).
Figure 12: Test 3.2 (b), SIR transport model with constant epidemic parameters and partially observed dynamics in hyperbolic regime (, ). Temporal evolution of the cumulative density of infected individuals in the whole domain (left) and of the reproduction number (right) obtained with the APNN using measurements from a shorter period of or (stopped at the dotted lines) compared with ground truth.

Test 3 (b): Partially observed dynamics in hyperbolic regime

In the second case, we consider a hyperbolic regime with and . As previously done, we consider two different contexts.

  • Test 3.1 (b): Parameter inference test. We first consider a sparse measurement setting, where measurements the spatio-temporal domain are available. The chosen samples are shown in Figure 10 (left) and have been selected again according to the density of , as described in (25).

  • Test 3.2 (b): Forecasting test. Secondly, we consider a forecasting task, training the APNN with the measurements generated from a limited time domain . In this example, we chose and with and measurements of densities employed respectively, and then evaluate the network performance over the time domain .

In both scenarios, residual points are employed on the spatio-temporal domain to enforce the underlying equations, still assuming that initial conditions of densities are unknown, as in the previous test case.

Parameters and estimated by the APNN from the sparse measurements are presented in Table 4, where we observe again a very good agreement with respect to true values. At the same time, the APNN is capable of reconstructing the correct dynamics of the phenomenon of interest in the whole domain besides the sparsity and incompleteness of data, as shown in Figure 10. On the other hand, we show results obtained when training the APNN with measurements taken from a shorter time period in Figures 11 and 12. In Figure 12, we plot the temporal evolution of the cumulative density of infected individuals in the whole domain as well as the effective reproduction number predicted by the APNN when the measurement data restrict to the shorter time periods . When , APNN predictions deviate from the ground truth almost immediately after the training period. In contrast, when the measurement data is extended to , the APNN produces good reconstructions and predictions in the forecasting region () with respect to the ground truth. Similar observations can be made for the approximations of densities and presented in Figure 11. This behavior of the APNN is observed because, when considering a dataset only for , we do not cover enough information of the major dynamics. We remark indeed that no data on fluxes is given to the APNN, which, in a hyperbolic regime, means to deal with a consistent lack of data knowledge. The predictions obtained, in fact, show that the APNN tends to smooth out the actual epidemic propagation pattern, not describing the correct transport/hyperbolic mechanism in the regions connecting the two urban areas. This appears clear when looking at Figure 11 (first column) and Figure 12, and observing that the dynamics predicted by the APNN tend to spread the virus faster in a more diffusive way, giving rise to a fake epidemic hot-spot around .

     
Figure 13: Test 4: SIR transport model with spatially variable transmission rate. Left: schematic representation of the spatial setting considered, with 3 initial hot-spots presenting different initial concentrations of infectious people, proportional to the light red circles. Individuals move from one location to another following the two opposite directions defined in the one-dimensional space with periodic boundary conditions. Due to the heterogeneous environment 3 additional hot spots will form along the main connection lines. Right: initial conditions for susceptible (top), infectious (middle) and removed (bottom).

4.4 Test 4: SIR transport model with heterogeneous environment

Next, we consider a much more challenging scenario, taking into account a spatially varied transmission rate that follows a hypothetical heterogeneous environment. An initial condition of the SIR multiscale transport model is designed to simulate the presence of 3 epidemic hot-spots aligned in the spatial domain , each one having a different initial density of infected individuals, distributed in space following again a Gaussian:

Here , , are the coordinates of the epidemic centers and , , define the different initial epidemic concentration in each spot. Assuming again that there are no immune individuals at and that the total population is uniformly distributed in the spatial domain, we set and . As previously, we impose initial fluxes in equilibrium, following (7), and periodic boundary conditions, to allow a connection also between hot-spots 1 and 3, so that the domain connecting the positions of these regions form the closed shape presented in Figure 13 (left). In the same figure (right), initial conditions of the 3 epidemic compartments are shown. We set the following spatially varied transmission rate [9, 42]:

with and perturbing this baseline value with oscillations of amplitude and frequency . The recovery rate is set to be . This choice of parameters simulates an infectious disease characterized by an initial reproduction number . With the APNN, the goal is to infer and as well as approximate the dynamics of densities based on the partially available measurements and the forecasting performance.

Parameter Ground Truth Initial Guess Estimation Relative Error
9 5 9.0170
2.5 1.5 2.4512
0.55 0.5 0.5508
Table 5: Test 4.1 (a): SIR transport model with spatially variable transmission rate and partially observed dynamics in diffusive regime (, ). Inferred results for the three different coefficients in the incidence function, , and relative error with respect to the correct solution.
Figure 14: Test 4.1 (a): SIR transport model with spatially variable transmission rate and partially observed dynamics in diffusive regime (, ). Selected sparse samples for the dataset marked with white crosses (left column), approximation obtained in the inverse problem (middle column), and ground truth (right column) of the densities of infected .
Figure 15: Test 4.2 (a): SIR transport model with spatially variable transmission rate and partially observed dynamics in diffusive regime (, ). Approximation and forecast with measurements taken from a shorter time period, stopped at the dashed line (left column), and ground truth (middle column) of the densities of infected (first row) and removed (second row). Temporal evolution of the cumulative density of infected individuals in the whole domain (first row, right) and of the effective reproduction number (second row, right) obtained with the APNN, trained based on measurements from a shorter time period (marked by the dotted line).

Test 4 (a): Partially observed dynamics in diffusive regime

In the first scenario, a parabolic configuration of speeds and relaxation parameters is considered with and .

Similar to the setting of Test 3, we investigate the capabilities of the proposed APNN when concerning heterogeneous epidemic environments through the following two scenarios.

  • Test 4.1 (a): Parameter inference test. First, we consider a relatively sparse availability of measurements, with samples over the spatio-temporal domain selected according to (25), as previously described, with the main task to infer unknown physical parameters and . The selected samples are indicated in Figure 14 (left).

  • Test 4.2 (a): Forecasting test. Secondly, the forecasting performance in predicting the spread of the infectious disease until with measurements generated from a shorter time period is investigated.

In both scenarios, the equation residual is enforced on residual points in the domain , and initial conditions are enforced on equally spaced points. Furthermore, we enforce the conservation (3) on equally spaced temporal points, and we randomly split 20% of each dataset for validation purpose.

In Table 5, we present the results of parameters inference based on the sparse measurements. The APNN accurately recovers the correct values for parameters and characterizing the epidemic incidence function, even when initial guesses are away from corresponding ground truth values. As illustrated in Figure 14, reconstruction of the density is also in a very good agreement with the ground truth. Notice that the three initial epidemic concentrations give rise to six different epidemic outbreaks in time due to the spatial heterogeneity assigned to the transmission rate. In Figure 15, we also present the approximated forward solutions for the forecasting task. A good match in the forecasting region () is observed, demonstrating once more the capability of the APNN to capture the underlying physics and deliver reasonably accurate predictions in the forecasting regions, even when spatially heterogeneous environments are considered in the context of partially observed systems.

Parameter Ground Truth Initial Guess Estimation Relative Error
9 5 9.0205
2.5 1.5 2.4691
0.55 0.5 0.5502
Table 6: Test 4.1 (b): SIR transport model with spatially variable transmission rate and partially observed dynamics in hyperbolic regime (, ). Inferred results from sparse measurements for the three different coefficients in the incidence function, , and relative error with respect to the ground truth.
Figure 16: Test 4.1 (b): SIR transport model with spatially variable transmission rate and partially observed dynamics in hyperbolic regime (, ). Selected sparse samples for the dataset marked with white crosses (left column), approximation obtained in the inverse problem (middle column), and ground truth (right column) of the densities of infected .
Figure 17: Test 4.2 (b): SIR transport model with spatially variable transmission rate and partially observed dynamics in hyperbolic regime (, ). Approximation and forecast with measurements taken from a shorter time period, stopped at the dashed line (left column), and ground truth (middle column) of the densities of infected (first row) and removed (second row). Temporal evolution of the cumulative density of infected individuals in the whole domain (first row, right) and of the effective reproduction number (second row, right) obtained with the APNN, trained based on measurements from a shorter time period (marked by the dotted line).

Test 4 (b): Partially observed dynamics in hyperbolic regime

In the second scenario, we consider a hyperbolic regime setting and . Similar to the previous test case, we consider two distinguished tasks for the APNN.

  • Test 4.1 (b): Parameter inference test. Initially, a sparse measurement dataset of training samples over the spatio-temporal domain is considered, based on the importance sampling previously described, and marked in Figure 16 (left), to solve the inverse problem and also evaluate the following forward reconstruction.

  • Test 4.2 (b): Forecasting test. Then, the APNN is trained with data samples selected from the spatio-temporal domain , and the reconstruction of the dynamics is evaluated until , to also examine the performance on the forecasting of the virus spread.

Equation residual is enforced on residual points on the domain in both setups, while points are applied to enforce initial conditions for densities and fluxes , and the conservation (3) is enforced on equally spaced temporal points. 20% of each dataset is randomly selected, as usual, for validation during the training process.

Similarly with the parabolic setting, the APNN is able to estimate the correct parameters of the spatially-varied transmission rate from sparse measurements in the hyperbolic regime, as shown in Table 6. In Figures 16 and 17, the approximated forward solutions and the ground truth of and over the space-time domain are shown. A good match between the APNN approximation and the ground truth is observed, for both sparse measurement and the measurement from a reduced training time domain, in the latter considering also predictions of the space-time dynamics. Notice that, as expected, due to the hyperbolic setting of the scaling parameters of this test, the six epidemic outbreaks that arise at different temporal levels due to the spatial movement of individuals are more contained in terms of spatial spread with respect to results obtained in the diffusive regime.

5 Conclusions

The recent Covid-19 pandemic has led to a significant development of mathematical models for describing epidemiological phenomena, which have also introduced the challenge of identifying the parameters involved from partial information. In this direction, recent developments in machine learning represent a promising tool for addressing such problems in the hope of identifying robust procedures for solving the corresponding inverse problems and also formulating predictive scenarios. This paper has addressed these problems in the context of spatially dependent epidemic models for which, in addition to the lack of information about the spread of the epidemic, face additional difficulties induced by the different scales at which the dynamics take place. These scales are representative of the different interactions that occur in densely populated areas, such as urban areas, or in suburban areas where the movement of individuals over long distances prevails. The construction of neural networks that can accurately describe the various scales is thus essential. In particular, we have shown how physically informed neural networks (PINN) that benefit from the asymptotic-preserving (AP) property provide considerably better results with respect to the different scales of the problem when compared with standard PINN. Several numerical tests have been presented to illustrate the performance of this new class of neural networks, referred to as asymptotic-preserving neural network (APNN), both for inverse and forward problems. Finally, we emphasize that even if, for presentation simplicity, we focused on a single population hyperbolic SIR model, the results extend naturally to multi-population transport models which include additional epidemic compartments [1, 7, 11].

Acknowledgments

G.B. and L.P. were partially supported by MIUR-PRIN Project 2017, No. 2017KKJP4X Innovative numerical methods for evolutionary partial differential equations and applications. G.B. also acknowledges the support by INdAM–GNCS. X.Z. was supported by the Simons Foundation (504054).

Appendix A AP-loss function for the Goldstein–Taylor model

Fixing a finite set of residual points , , and considering the available dataset , we define the loss function for the Goldstein–Taylor model as follows

(26)

The expressions of and