1 Introduction
In recent decades, enormous progress has been made in the understanding of complex systems described by multiscale PDEs with applications ranging from classical physics and engineering to biology and social sciences [1, 36, 4, 5, 11, 25, 14].
Despite continuing progress, modeling and predicting the evolution of nonlinear multiscale systems using classical analytical or computational tools inevitably faces severe challenges. Firstly, numerically solving a multiscale problem requires complex and sophisticated computational codes and can introduce prohibitive costs (due to the wellknown curse of dimensionality). Moreover, we are always facing the difficulties related to the scarcity of data and multiple sources of uncertainty, especially when concerning social sciences. Above all, solving real physical problems with missing or incomplete initial or boundary conditions through traditional approaches is currently impractical. This is where and why datadriven models began to play a crucial role [17].
Machine Learning (ML) is an incredibly powerful tool, which has proven to have an enormous impact in many fields of our society. This has led to great interest in using ML techniques also to study challenging scientific problems in science, engineering, medicine, concerning complex multiscale dynamics. However, it is clear that the problems we are dealing with are very different from the classical problems in which ML has proved to be so successful. So, we cannot simply take the available ML methods as a “black box” and use them uncritically [2]. Purely datadriven models may fit observations very well, but predictions may result physically inconsistent and, consequently, lead to erroneous generalizations. Therefore, there is an urgent need to integrate fundamental physical laws and related mathematical models into the learning process of the neural networks [28, 18, 33]. The main motivation for developing this new class of physicsinformed machine learning algorithms is that such prior physical knowledge or constraints can ensure that ML methods remain robust even in the presence of imperfect data (such as missing, incomplete or noisy data) and provide accurate predictions that adhere to the physics of the phenomenon under study.
A recent example of this new learning paradigm is represented by PhysicsInformed Neural Networks (PINNs) [15, 38, 28, 44]
. PINNs are a new class of deep neural networks (DNNs) that are trained to solve supervised learning tasks while respecting any given physical laws described through general nonlinear ordinary differential equations (ODEs) or partial differential equations (PDEs). The physical knowledge of the underlying phenomenon is incorporated into the PINN mainly in two ways: either it is introduced directly through the data embodying the underlying physics of the phenomenon of interest (observational bias) or it is introduced by an appropriate choice of the loss function that the PINN must minimize, forcing the training phase of the neural network to converge to solutions that adhere to the underlying physics (learning bias).
Nevertheless, the adoption of a standard formulation of PINNs in the context of multiscale problems may still lead to incorrect inferences and predictions [26]. This is mainly due to the presence of small scales leading to reduced or simplified models in the system that need to be enforced consistently during the learning process. In these cases, a standard PINN formulation allows an accurate description of the process only at the leading order, thus loosing accuracy in the asymptotic limit regimes. One remedy for this, as recently proposed in [26]
, is to modify the loss function to include asymptoticpreserving (AP) properties during the training process. The realization of such an APloss function will therefore depend on the particular problem under study and will be based on an appropriate asymptotic analysis of the model.
One particularly interesting area where the use of machine learning techniques can play a key role concerns epidemiological dynamics. In this context, a number of mathematical models have recently been proposed that require the estimation of several parameters from data to provide predictive scenarios and to test their reliability
[1, 10, 7, 11, 12, 19, 20, 37, 41]. In this paper we will focus on a new class of epidemic models described by multiscale PDEs capable of describing both hyperbolictype phenomena characteristic of epidemic propagation over long distances and main lines of communication between cities and parabolictype phenomena in which classical diffusion prevails at the urban level [1, 10, 7, 11]. The multiscale nature of the problem poses a challenge to the construction of PINN, and preservation of the AP property is therefore essential in order to obtain reliable results. Following the approach recently introduced in [26], we will show how to construct AP neural networks (APNNs) that are capable to solve both inverse and forward problems of interest in epidemic dynamics.The rest of the paper is organized as follows. The next section is devoted to the description of the model under study and a formal analysis of the different multiscale behaviors. In Section 3 we introduce the notion of APNN and describe how to construct such a neural network in the case of a simplified multiscale hyperbolic model and then how to extend it to the epidemic case under study. A series of numerical results for both inverse and forward problems using synthetic data produced by the numerical solution of the mathematical model illustrate the validity of the present approach. In particular, the case of partially observed systems, as commonly found in epidemics, will be considered and permits to emphasize the relevance of the APproperty. Some final considerations and future developments are reported in a concluding section.
2 Hyperbolic models of epidemic spread
For simplicity, we illustrate the space dependent epidemiological modeling in the case of a classic SIR compartmental dynamics, in which we consider a population subdivided in susceptible (individuals who may be infected by the disease), infectious (individuals who may transmit the disease) and removed (individuals healed and immune or died due to the disease). We assume to have a population with subjects having no prior immunity and neglect the vital dynamics represented by births and deaths due to the time scale considered. Nevertheless, it is straightforward to extend our arguments to more enriched compartmentalizations, designed to take into account specific features of the infectious disease of interest, as those proposed recently in [10, 7, 11, 12, 19, 20, 37, 41] to study the spread of COVID19.
2.1 The hyperbolic SIR model
By analogy with discretevelocity kinetic theory [36, 4], we consider individuals moving in a onedimensional domain in two opposite directions, with velocities , distinguished for each epidemic compartment. Notice that the characteristic velocities reflect the heterogeneity of geographical areas, and, therefore, are chosen dependent on the spatial location . Hence, we can describe the spacetime dynamics of the population for through the following twovelocity SIR epidemic transport model [1, 9, 8]:
(1) 
with the total densities of each compartment, , , and , given by
The transport dynamics of the population is governed by the scaling parameters as well as the relaxation times . The quantity is the recovery rate of infected, which corresponds to the inverse of the infectious period. This rate may vary in space and time depending on the treatment therapies used, even though generally can be assumed constant, especially for shortterm analysis. The transmission of the infection is defined by an incidence function modeling the transmission of the disease [23, 13, 30]. The transmission rate
characterizes the average number of contacts per person per time, multiplied by the probability of disease transmission in a contact between a susceptible and an infectious subject. Notice that this rate may vary in space and time as a consequence of the intensification of governmental control actions (such as mandatory wearing of masks, closing of specific activities or full lockdowns) or their relaxation (lifting mask mandates, reopening schools, restaurants, leisure and cultural centers) in specific locations.
It is worth to highlight that, when investigating real epidemic scenarios, the abovementioned parameters are, in general, unknown. While the recovery rate might be fixed based on clinical data, the transmission rate must always be estimated through a delicate calibration process in order to match available data. It is also wellknown that this process is highly heterogeneous which makes the inverse problem even more challenging [16].
The standard threshold of epidemic models is the socalled basic reproduction number , which defines the average number of secondary infections produced by one infected individual in a totally susceptible population [23]. The effective reproduction number , instead, defines the variation in time of this rate, giving information on the progress of the infectious spread. Indeed, this number determines when an infection can invade and persist in a new host population (), or tend to fade away (). The endemic state corresponds to the case .
Assuming no inflow/outflow boundary conditions in , integrating in space and summing up the second equation in (1) we are able to define the effective reproduction number of the SIR transport model
(2) 
Notice that this definition naturally extends locally by integrating over any subset of the computational domain if one ignores the boundary flows. Under the same no inflow/outflow boundary conditions, if we integrate in equations (1), we can finally observe that the model fulfill the conservation of the total population, being
(3) 
with , and total population reference size, constant over time.
2.2 Multiscale behavior and diffusion limit
Introducing the fluxes, defined by
(4) 
we obtain a hyperbolic model equivalent to (1), but presenting a macroscopic description of the propagation of the epidemic at finite speeds
(5) 
Let us now consider the behavior of this model in diffusive regimes [32]. To this aim, we introduce the space dependent diffusion coefficients
(6) 
which characterize the diffusive transport mechanism of susceptible, infectious and removed, respectively. Keeping the above quantities fixed while letting the relaxation times (and so the characteristic velocities ), from the last three equations in (5) we obtain, for each epidemic compartment, a proportionality relation between the flux and the spatial derivative of the corresponding density (Fick’s law)
(7) 
Substituting (7) into the first three equations in (5), we recover the following parabolic reactiondiffusion model, widely used in literature to study the spread of infectious diseases [35, 40, 43, 24, 6]
(8) 
The model’s capability to account for different regimes, ranging from hyperbolic to parabolic, according to the space dependent values and , makes it suitable for describing the dynamics of human beings. Our daily routine is, indeed, a complex mixing of individuals moving at the scale of a city center and individuals traveling among different municipalities. In this situation, it results more appropriate to describe the human dynamics in city centers with a high density of individuals through a diffusion operator, while characterizing the mobility of subjects in extraurban areas through a hyperbolic, advective, mechanism, avoiding in this case a propagation of the information at infinite speeds [10, 7, 11].
3 AsymptoticPreserving Neural Networks (APNNs)
In this section, we provide a brief overview of the general framework of PINNs [28, 38] and then we shall discuss the relevant concepts of AsymptoticPreserving Neural Networks (APNNs) for the problems of interest.
3.1 Basics of PINNs
The design of a standard deep neural network (DNN) by supervised learning can be summarized in three main steps [18]:

The choice of the neural network structure.

The loss function that minimizes the classical empirical risk, typically characterized by the difference between model and data.

A method to minimize loss over the parameter space. The most popular choices are stochastic gradient descent (SGD) and advanced optimizers such as Adam
[31].
In practice, the performance of the neural network is estimated on a finite data set (which is unrelated to any data used to train the model) and called test error, whereas the error in the loss function (which is used for training purposes) is called the training error.
Compared to the above classical deep learning methodology, the major difference of PINN is the integration of physical laws, usually in the form of PDEs
(9)  
Here is the spatiotemporal domain of the system, represents its boundary, is the differential operator, represents the solution to the system, is the parameter related to the physics. Since the initial condition is mathematically equivalent to the boundary condition in the spatiotemporal domain, we use as a general operator for arbitrary initial and boundary conditions of the system.
PINN models usually include a neural network representation of the solution , parameterized by network parameters and having
input data. In PINN literature, the most widely used neural network architecture is the feedforward neural network (FNN). A
layered FNN consists of an input layer, an output layer, and hidden layers, which can be defined as followswhere are the weights, the bias, is the width of the th hidden layer with the input dimension and the output dimension,
is a scalar activation function (such as ReLU
[21]), and “” denotes entrywise operation. Thus, we denote the set of network parameters .To find the optimal values for , the neural network is trained by minimizing the following type of loss (also called cost or risk) function
(10) 
Here and quantify the discrepancy of the neural network surrogate with the underlying PDE and its initial or boundary conditions in (9), respectively. The data mismatch loss is applied when additional measurement data are available, e.g., when solving inverse problems, and , ,
are the corresponding weight vectors. The most popular methods chosen to solve this optimization problem remain stochastic gradient descent (SGD) and Adam
[31]. After finding the optimal set of parameter values by minimizing the PINN loss (10), i.e.,(11) 
the neural network surrogate can be evaluated at any given spatiotemporal point to get the solution.
In the context of inverse problems, the structure of the network is almost the same with respect to the forward problem setting, except that unknown physical parameters are treated as learnable parameters. As a result, the training process involves optimizing and jointly
(12) 
In summary, PINN can be regarded as an unsupervised learning approach when used to train forward problems, with only equations residual and boundary conditions in the loss function, and as a semisupervised learning approach for inverse problems, when some measurements are available. In the last part of this section, we shall further discuss in detail each component of this learning framework through several examples.
3.2 Extension to APNNs
Since we aim at analyzing multiscale hyperbolic dynamics regardless of the propagation scaling, in order to obtain physicallybased predictions, it is important that the PINN can preserve the correct equilibrium solution (8) in the diffusive regime, which means that the PINN should fulfill the AP property [25, 26, 34, 22]. We remark that in the context of the epidemic modeling of this work, the AP property is of particular importance, allowing the same neural network to efficiently and robustly simulate population dynamics characterized by both diffusive and hyperbolic transport behaviors (the former in urban centers and the latter for mobility along connecting routes).
The neural networks satisfying this property are called AsymptoticPreserving Neural Networks (APNNs), and have been recently introduced in [25, 26] to efficiently solve multiscale kinetic problems with scaling parameters that can have several orders of magnitude of difference. The definition of an APNN reported in [26] for the case of multiscale kinetic models with continuous velocity fields is generalized in the following (see Figure 1).
Definition 1 (AsymptoticPreserving Neural Network).
Assume the solution is parameterized by a PINN trained by using an optimization method to minimize a loss function which includes a residual term enforcing the physics of the phenomenon. Then we say it is an AsymptoticPreserving Neural Network (APNN) if, as the physical scaling parameter of the multiscale model tends to zero, the loss function of the full modelconstraint converges to the loss function of the corresponding reduced order model.
In other words, the loss function, viewed as a numerical approximation of the original equation, benefits from the AP property.
3.3 A simple example: APNN for the Goldstein–Taylor model
To illustrate the relevance of the AP property in the construction of the neural network, let us carry on a detailed example by considering a simplified case in which there are no epidemic source terms that allow individuals to move to a different compartment and the entire population behaves as a single compartment. Such a case corresponds to the socalled Goldstein–Taylor model in discrete velocity kinetic theory [36, 27]. This model, indeed, describes the spacetime evolution of the two particles densities and , at time , traveling in a onedimensional domain, , with velocity , respectively. At the same time, particles can change and assume the opposite velocity, randomly. The dynamics of this system of particles is governed by the following system of PDEs
(13) 
with scaling parameter of the kinetic dynamics and scattering coefficient. The total particles density is given by .
We consider to be a DNN with inputs and and trainable parameters , to approximate the solution of our system: . Then, we define the PDEs residual
(14) 
and incorporate it into the loss function term of the neural network by taking the weighted mean square error of the residual to obtain a standard PINN.
To understand the asymptotic behavior of the model we resort on a suitable macroscopic formulation of the system which is achieved through the introduction of the scaled flux . This permits to write the system (13) in equivalent form as
(15) 
In the diffusion limit, i.e. let , we obtain
(16) 
which, inserted into the first equation, leads to the reduced diffusive model (which recalls the standard heat equation)
(17) 
It is clear that the standard PINN residual (14) is not consistent with the above analysis since in the limit reduces to
which corresponds to force and does not suffice to achieve the correct diffusive behavior (17).
In contrast, using the macroscopic formulation (15), we can construct an APNN incorporating in the loss function the mean square error of the PDEs residuals
(18) 
Now, in the limit , we obtain
(19) 
which is consistent with the residual of the limiting diffusive model (17). We refer to Appendix A for a detailed description of the loss function for the GoldsteinTaylor model, including data and boundary conditions loss terms.
3.4 APNN for the hyperbolic SIR model
To achieve the AP property in the neural network for the hyperbolic SIR model, we follow the same approach of the previous section. Thus, we consider the system written in macroscopic form defined by equations (5). Multiplying both members of each equation for the corresponding scaling parameter , we can rewrite the system in the following compact form
(20) 
where
We consider to be a deep neural network (NN) with inputs and and trainable parameters , to approximate the solution of our system: . Then, we define the residual term
(21) 
and embed it into the loss function of the neural network to obtain an APNN. We omit for brevity the detailed analysis of the AP property. In the limit as , , , under conditions (6), such analysis follows the same steps of the previous section, and results in agreement with the diffusion limit computed in Section 2.2.
We restrict the neural network approximation to satisfy the physics imposed by the residual (21) on a finite set of userspecified scattered points inside the domain, (referred as residual points) and we also enforce the initial and spaceboundary conditions of the system on scattered points of the spacetime boundary , i.e. [29]. In the context of inverse problems, we also consider to have access to measured data, with a dataset , with , available in a finite set of fixed training points. Thus, in the training process of the PINN, we minimize the following APloss function, composed of four mean squared error terms
(22) 
where , , , characterize the weights associated to each contribution. Notice that quantifies the mismatch of the approximated solution with respect to known data samples, while , and represent the discrepancy in initial/boundary conditions of (20), in the residual (21) and with respect to the conservation of the total density in the domain (3), respectively, all three contributing to enforce the physical structure of the problem. We present the detailed expression of each term in (22) in Appendix B. A schematic representation of the APNN architecture is given in Figure 2.
4 Numerical examples and applications
In this section, various numerical tests are presented to assess the performance of the proposed APNNs. The first two examples concern the usage of an APNN for the solution of inverse and forward problem set up considering as prototype multiscale hyperbolic system either the standard Goldstein–Taylor model or a slightly modified version of it. Even if this model is a simpler system of equations with respect to (5), it well represents the dynamics of interest, as discussed in Section 3.3. These tests are designed to further highlight how the choice of the APNN formulation proposed in this work is fundamental for the treatment of multiscale problems, especially in the context of availability of partial information. We shall demonstrate also numerically with this prototype model (and we refer to Section 3.3 for the analytical proof) that a standard PINN formulation leads to the loss of the AP property and, consequently, to nonphysical reconstructions of the sought dynamics.
Following that, various tests concerning the solution of epidemic problems are discussed, examining the APNN performance in inferring the unknown epidemic parameters, solving the forward problem, and forecasting the spread of the infectious disease, also when spatially heterogeneous parameters are considered.
The numerical solution obtained with a secondorder APIMEX RungeKutta Finite Volume method [9, 10] is considered as synthetic data for the ground truth and used in the APNN to build up the training dataset. With regards of epidemic test cases, we remark here, as also discussed in Appendix B, that since data of fluxes are not accessible in realworld applications, we only enforce the measurements of in . Nevertheless, unless otherwise specified, we impose initial conditions of the fluxes in . In all the examples, periodic boundary conditions are considered. To strictly impose them (accounted again in ), we employ the periodic mapping technique taken from [45] in the input layer
(23) 
where
is a hyperparameter controlling the frequency of the solution. For the tests concerning the Goldstein–Taylor model, the activation function
sin is chosen, adopting the SIREN framework [39]; for the epidemic tests, the function tanh is used. Finally, the Adam method [31] is used for the optimization process and derivatives in the NN are computed applying automatic differentiation [3].For all the numerical examples, we adopt a single feedforward neural network with depth and width . The model structure is deliberately fixed among numerical experiments in both parabolic and hyperbolic regime, to highlight the main advantage of AP schemes that macroscopic behavior can be captured without resolving small physical parameters numerically (i.e. the architectural parameters of the neural network are independent of the physical scaling parameters). The chosen model and training hyperparameters are given in Tables 7 and 8 of the Appendix C for each test case.
4.1 Test 1: GoldsteinTaylor model in diffusive regimes
In the following, we seek to emphasize numerically the importance of choosing the correct formulation to preserve the AP property and correctly approximate population dynamics even in diffusive regimes, particularly when dealing with partial information available. To this aim, we set up for problem (13) a test with initial conditions
with and . We consider periodic boundary conditions, choosing in the periodic mapping (23), and only the diffusive, parabolic regime of the model, choosing , with final time of the simulation .
with respect to epochs using the APNN (left) and the standard PINN (right).
Inverse Problem
Initially, we consider an inverse problem inferring the scattering coefficient from the available measurement data using the APNN formulation presented in Appendix A, with loss function (26) and term given in (28). For comparison, we also solve the inverse problem applying the standard PINN residual (14) in the loss function. For both APNN and standard PINN formulations, we train the network model on measurements composed of equally spaced samples in the domain , from which 20% (4800) points are randomly selected for validation purpose. For the APNN model we consider measurements only for the density , hence assuming to have no information on the flux , whereas for the standard formulation we employ data samples for both the densities and (therefore, in the latter case we assume we have more information on the system (13)). In addition, residual points are employed with the same data split for validation set. With respect to loss function and training hyperparameters of the APNN given in Table 7, the same setting has been used also for the standard PINN, with the only difference just stated that, when used, the training dataset is given for both variables considering equal weights .
We show the convergence of the target parameter in Figure 3 for both PINN formulations. A very fast convergence can be observed in the APNN, with the initial guess and a final relative error . However, it can be observed that the standard PINN failed to recover the correct value of the scattering parameter (at epoch 4000, earlystopping prevents further training of the PINN).
Forward Problem
To further highlight the importance of the AP property, we consider a forward problem for the GoldsteinTaylor model, where scattering coefficient is given and the goal now is to solve the equations on the spatiotemporal domain with corresponding initial conditions. For APNN formulation, points are employed to enforce initial conditions of both and , with equation enforced on residual points on the domain . The standard PINN formulation based on the kinetic equations (13) share the same set with APNN, but initial conditions are given for .
We plot the solutions obtained with the standard PINN in Figure 4 and with APNN in Figure 5. Standard PINN based on the kinetic equations (13) shows its weakness and converges to a trivial solution on the spacetime domain, failing to approximate the forward solution of both density and flux. On the contrary, the adoption of the APNN ensures the convergence towards the correct diffusive limit, which is also beneficial for the inverse problem we considered before.
4.2 Test 2: Goldstein–Taylor model with source term
To examine the performance of the APNN with a more challenging setting closely related to epidemic scenarios that we shall discuss later on, we introduce a source term that creates an oscillatory effect in the density in the Goldstein–Taylor model. The resulting system reads
(24) 
where . For this problem, we reformulate the APloss function accordingly to the model, simply including the presence of the source term with respect to the formulation discussed in Appendix A. In the source term, we set , with a baseline value perturbed by sinusoidal oscillations having amplitude and frequency . We consider again a spatial domain and . The final goal in this test is to infer parameters , and and evaluate the spatiotemporal reconstruction given by the APNN with a partially observed system, having only information of , considering a scattering coefficient and the following initial conditions:
Parameter  Ground Truth  Initial Guess  Estimation  Relative Error 

0  0.5  0.0011  N/A  
3  2  2.9263  
4  3  4.0003 
Test 2 (a): Diffusive regime with density data only
We initially consider a diffusive, parabolic regime defined by , with . We employ for , not considering any dataset for , while still imposing initial and boundary conditions for both variables. For the residual term, we use points on the domain . We use 20% of and for validation purposes and the rest for the training. Results of the parameters inference are shown in Table 1, where initial guesses of target variables are listed, even though we observed that the neural network is not very sensitive to the choice of these values. From these results we can observe that, in general, the most difficult coefficient to calibrate with the NN is the amplitude of the perturbation of the source term, .
The APNN forward approximations of and are presented in Figure 6, where we can observe that forward solutions well capture the correct dynamics of and accurately recover without any measurement on the latter. Nonetheless, we acknowledge that when concerning diffusive regimes as in Eq. (17), the problem results fully described by the sole density , and the absence of information on does not lead to an actual lack of data knowledge.
Parameter  Ground Truth  Initial Guess  Estimation  Relative Error 

0  0.5  N/A  
3  2  3.0005  
4  3  4.0002 
Test 2 (b): Hyperbolic regime with density data only
In the second case, we consider a hyperbolic regime with and . We employ for , not considering again any dataset for , and fix on the domain , with 20% of each dataset for validation. Coefficients inferred by the APNN are listed in Table 2, while forward solutions are shown in Figure 7. Similar to the diffusive regime, the APNN correctly infer all the unknown parameters and is capable of approximating the solution of densities and well, but in this case in a much more demanding problem. Indeed, even though in hyperbolic regimes the problem is not completely defined by the sole density of the system, being the dataset really incomplete without any information on the flux , the APNN is still capable of approximating the correct solution of the whole dynamics.
4.3 Test 3: SIR transport model with constant epidemic parameters
In the following, we evaluate the performance of the APNN with respect to the dynamics governed by the SIR multiscale transport model (5). We first design a numerical test with an initial condition that simulates the presence of two epidemic hotspots, aligned in the spatial domain , presenting a different number of infected individuals, distributed following a Gaussian function,
where and are the coordinates of the hotspots, while and define the different initial epidemic concentration in the two cities, hence with a deeply higher density of infected individuals in the first city. Assuming that there are no immune individuals at
and that the total population is uniformly distributed in the domain, we have
We impose initial fluxes in equilibrium, following (7), and periodic boundary conditions to allow both directions of connection for the two cities. We initially consider a simple setting defined by constant epidemic parameters in space and time, with and , which lead to study an infectious disease characterized by an initial reproduction number .
The APNN is used to infer both the epidemic parameters as well as approximate the solutions for a parabolic and a hyperbolic scenario. To mimic the availability of data close to reality, we use a sparse dataset for the training process, sampling the spatiotemporal points from the available dataset with probability proportional to the magnitude of . We consider, indeed, that in realworld epidemic scenarios data on the evolution of the infectious disease are only available in the regions in which the virus has already started to spread. Specifically, the probability of each spatiotemporal location chosen for the training dataset is given by
(25) 
Test 3 (a): Partially observed dynamics in diffusive regime
In the first case, a parabolic configuration of speeds and relaxation parameters is considered, with and . We examine the performance of the APNN in the two following different problems.

Test 3.2 (a): Forecasting test. As a second problem, we intend to investigate the forecasting capability of the APNN. In contrast with sampling measurements available across the entire spatiotemporal domain in the parameter inference test, we generate a training dataset of size on a shorter time domain and we assess the correctness of APNN approximations in and forecasting performance in .
In both cases, equations residual are enforced on residual points on the spatiotemporal domain and 20% of each dataset is used for validation. In addition, we assume initial conditions for are unknown in both problems, thus requiring an even more demanding performance to the APNN.
Parameter  Ground Truth  Initial Guess  Estimation  Relative Error 

12  8  11.9428  
6  3  5.9772 
Results of the parameter inference task based on the sparse measurement dataset are reported in Table 3, where an excellent estimation of both and can be observed with respect to the ground truth. Figure 8 shows that the reconstructed forward approximations for the density of the epidemic compartment have an excellent agreement with the true solution in the entire domain. Also in the forecasting test, the approximated and predicted dynamics (based on the measurements from the time period ) perfectly match the ground truth in the entire domain , as shown in Figure 9, even if in this demanding setting initial conditions of densities are assumed to be unknown. These results further highlight the capability of APNN to forecast the spread of an infectious disease in diffusive regimes thanks to the physical knowledge of the PDE system embedded in the NN together with the preservation of the AP property. In the same Figure, we present also the temporal evolution of the cumulative density of infected individuals in the whole domain as well as the effective reproduction number predicted by the APNN. The excellent agreement between predictions () and the ground truth out of the training domain further assess the forecasting capability of APNNs.
Parameter  Ground Truth  Initial Guess  Estimation  Relative Error 

12  8  12.0126  
6  3  6.0447 
Test 3 (b): Partially observed dynamics in hyperbolic regime
In the second case, we consider a hyperbolic regime with and . As previously done, we consider two different contexts.

Test 3.2 (b): Forecasting test. Secondly, we consider a forecasting task, training the APNN with the measurements generated from a limited time domain . In this example, we chose and with and measurements of densities employed respectively, and then evaluate the network performance over the time domain .
In both scenarios, residual points are employed on the spatiotemporal domain to enforce the underlying equations, still assuming that initial conditions of densities are unknown, as in the previous test case.
Parameters and estimated by the APNN from the sparse measurements are presented in Table 4, where we observe again a very good agreement with respect to true values. At the same time, the APNN is capable of reconstructing the correct dynamics of the phenomenon of interest in the whole domain besides the sparsity and incompleteness of data, as shown in Figure 10. On the other hand, we show results obtained when training the APNN with measurements taken from a shorter time period in Figures 11 and 12. In Figure 12, we plot the temporal evolution of the cumulative density of infected individuals in the whole domain as well as the effective reproduction number predicted by the APNN when the measurement data restrict to the shorter time periods . When , APNN predictions deviate from the ground truth almost immediately after the training period. In contrast, when the measurement data is extended to , the APNN produces good reconstructions and predictions in the forecasting region () with respect to the ground truth. Similar observations can be made for the approximations of densities and presented in Figure 11. This behavior of the APNN is observed because, when considering a dataset only for , we do not cover enough information of the major dynamics. We remark indeed that no data on fluxes is given to the APNN, which, in a hyperbolic regime, means to deal with a consistent lack of data knowledge. The predictions obtained, in fact, show that the APNN tends to smooth out the actual epidemic propagation pattern, not describing the correct transport/hyperbolic mechanism in the regions connecting the two urban areas. This appears clear when looking at Figure 11 (first column) and Figure 12, and observing that the dynamics predicted by the APNN tend to spread the virus faster in a more diffusive way, giving rise to a fake epidemic hotspot around .
4.4 Test 4: SIR transport model with heterogeneous environment
Next, we consider a much more challenging scenario, taking into account a spatially varied transmission rate that follows a hypothetical heterogeneous environment. An initial condition of the SIR multiscale transport model is designed to simulate the presence of 3 epidemic hotspots aligned in the spatial domain , each one having a different initial density of infected individuals, distributed in space following again a Gaussian:
Here , , are the coordinates of the epidemic centers and , , define the different initial epidemic concentration in each spot. Assuming again that there are no immune individuals at and that the total population is uniformly distributed in the spatial domain, we set and . As previously, we impose initial fluxes in equilibrium, following (7), and periodic boundary conditions, to allow a connection also between hotspots 1 and 3, so that the domain connecting the positions of these regions form the closed shape presented in Figure 13 (left). In the same figure (right), initial conditions of the 3 epidemic compartments are shown. We set the following spatially varied transmission rate [9, 42]:
with and perturbing this baseline value with oscillations of amplitude and frequency . The recovery rate is set to be . This choice of parameters simulates an infectious disease characterized by an initial reproduction number . With the APNN, the goal is to infer and as well as approximate the dynamics of densities based on the partially available measurements and the forecasting performance.
Parameter  Ground Truth  Initial Guess  Estimation  Relative Error 

9  5  9.0170  
2.5  1.5  2.4512  
0.55  0.5  0.5508 
Test 4 (a): Partially observed dynamics in diffusive regime
In the first scenario, a parabolic configuration of speeds and relaxation parameters is considered with and .
Similar to the setting of Test 3, we investigate the capabilities of the proposed APNN when concerning heterogeneous epidemic environments through the following two scenarios.

Test 4.1 (a): Parameter inference test. First, we consider a relatively sparse availability of measurements, with samples over the spatiotemporal domain selected according to (25), as previously described, with the main task to infer unknown physical parameters and . The selected samples are indicated in Figure 14 (left).

Test 4.2 (a): Forecasting test. Secondly, the forecasting performance in predicting the spread of the infectious disease until with measurements generated from a shorter time period is investigated.
In both scenarios, the equation residual is enforced on residual points in the domain , and initial conditions are enforced on equally spaced points. Furthermore, we enforce the conservation (3) on equally spaced temporal points, and we randomly split 20% of each dataset for validation purpose.
In Table 5, we present the results of parameters inference based on the sparse measurements. The APNN accurately recovers the correct values for parameters and characterizing the epidemic incidence function, even when initial guesses are away from corresponding ground truth values. As illustrated in Figure 14, reconstruction of the density is also in a very good agreement with the ground truth. Notice that the three initial epidemic concentrations give rise to six different epidemic outbreaks in time due to the spatial heterogeneity assigned to the transmission rate. In Figure 15, we also present the approximated forward solutions for the forecasting task. A good match in the forecasting region () is observed, demonstrating once more the capability of the APNN to capture the underlying physics and deliver reasonably accurate predictions in the forecasting regions, even when spatially heterogeneous environments are considered in the context of partially observed systems.
Parameter  Ground Truth  Initial Guess  Estimation  Relative Error 

9  5  9.0205  
2.5  1.5  2.4691  
0.55  0.5  0.5502 
Test 4 (b): Partially observed dynamics in hyperbolic regime
In the second scenario, we consider a hyperbolic regime setting and . Similar to the previous test case, we consider two distinguished tasks for the APNN.

Test 4.1 (b): Parameter inference test. Initially, a sparse measurement dataset of training samples over the spatiotemporal domain is considered, based on the importance sampling previously described, and marked in Figure 16 (left), to solve the inverse problem and also evaluate the following forward reconstruction.

Test 4.2 (b): Forecasting test. Then, the APNN is trained with data samples selected from the spatiotemporal domain , and the reconstruction of the dynamics is evaluated until , to also examine the performance on the forecasting of the virus spread.
Equation residual is enforced on residual points on the domain in both setups, while points are applied to enforce initial conditions for densities and fluxes , and the conservation (3) is enforced on equally spaced temporal points. 20% of each dataset is randomly selected, as usual, for validation during the training process.
Similarly with the parabolic setting, the APNN is able to estimate the correct parameters of the spatiallyvaried transmission rate from sparse measurements in the hyperbolic regime, as shown in Table 6. In Figures 16 and 17, the approximated forward solutions and the ground truth of and over the spacetime domain are shown. A good match between the APNN approximation and the ground truth is observed, for both sparse measurement and the measurement from a reduced training time domain, in the latter considering also predictions of the spacetime dynamics. Notice that, as expected, due to the hyperbolic setting of the scaling parameters of this test, the six epidemic outbreaks that arise at different temporal levels due to the spatial movement of individuals are more contained in terms of spatial spread with respect to results obtained in the diffusive regime.
5 Conclusions
The recent Covid19 pandemic has led to a significant development of mathematical models for describing epidemiological phenomena, which have also introduced the challenge of identifying the parameters involved from partial information. In this direction, recent developments in machine learning represent a promising tool for addressing such problems in the hope of identifying robust procedures for solving the corresponding inverse problems and also formulating predictive scenarios. This paper has addressed these problems in the context of spatially dependent epidemic models for which, in addition to the lack of information about the spread of the epidemic, face additional difficulties induced by the different scales at which the dynamics take place. These scales are representative of the different interactions that occur in densely populated areas, such as urban areas, or in suburban areas where the movement of individuals over long distances prevails. The construction of neural networks that can accurately describe the various scales is thus essential. In particular, we have shown how physically informed neural networks (PINN) that benefit from the asymptoticpreserving (AP) property provide considerably better results with respect to the different scales of the problem when compared with standard PINN. Several numerical tests have been presented to illustrate the performance of this new class of neural networks, referred to as asymptoticpreserving neural network (APNN), both for inverse and forward problems. Finally, we emphasize that even if, for presentation simplicity, we focused on a single population hyperbolic SIR model, the results extend naturally to multipopulation transport models which include additional epidemic compartments [1, 7, 11].
Acknowledgments
G.B. and L.P. were partially supported by MIURPRIN Project 2017, No. 2017KKJP4X Innovative numerical methods for evolutionary partial differential equations and applications. G.B. also acknowledges the support by INdAM–GNCS. X.Z. was supported by the Simons Foundation (504054).
Appendix A APloss function for the Goldstein–Taylor model
Fixing a finite set of residual points , , and considering the available dataset , we define the loss function for the Goldstein–Taylor model as follows
(26) 
The expressions of and