EINNs: Epidemiologically-Informed Neural Networks

We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting. We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models, to ingest heterogeneous information. Although neural forecasting models has been successful in multiple tasks, long-term predictions and anticipating trend changes remain open challenges. Epidemiological ODE models contain mechanisms that can guide us in these two tasks; however, they have limited capability of ingesting data sources and modeling composite signals. Thus we propose to supervise neural networks with epidemic mechanistic models while simultaneously learning their hidden dynamics. Our method EINN allows neural models have the flexibility to learn the disease spread dynamics and use auxiliary features in a general framework. In contrast with previous work, we not assume the observability of complete dynamics and do not need to numerically solve the ODE equations during training. Our thorough experiments showcase the clear benefits of our approach with other non-trivial alternatives.


page 1

page 2

page 3

page 4


Disease Informed Neural Networks

We introduce Disease Informed Neural Networks (DINNs) – neural networks ...

Asymptotic-Preserving Neural Networks for multiscale hyperbolic models of epidemic spread

When investigating epidemic dynamics through differential models, the pa...

A Recurrent Neural Network and Differential Equation Based Spatiotemporal Infectious Disease Model with Application to COVID-19

The outbreaks of Coronavirus Disease 2019 (COVID-19) have impacted the w...

Epidemic Dynamics via Wavelet Theory and Machine Learning, with Applications to Covid-19

We introduce the concept of epidemic-fitted wavelets which comprise, in ...

Leveraging the structure of dynamical systems for data-driven modeling

The reliable prediction of the temporal behavior of complex systems is r...

Back2Future: Leveraging Backfill Dynamics for Improving Real-time Predictions in Future

In real-time forecasting in public health, data collection is a non-triv...

Solving Inventory Management Problems with Inventory-dynamics-informed Neural Networks

A key challenge in inventory management is to identify policies that opt...

1 Introduction

The COVID-19 pandemic has led to a maturing of methods for epidemic modeling and forecasting with the CDC establishing the first Center for Forecasting and Outbreak Analytics in 2021. A variety of forecasting innovations in machine learning and deep learning were developed (e.g., 

[Rodríguez and others2021a, Kamarthi and others2021b, Arik and others2020]), with many lessons learned for COVID-19 and future pandemics. As the current experience has shown, predicting and preventing epidemics is one of the major challenges with far reaching impacts on health, economy and broad social well being111https://www.who.int/activities/preventing-epidemics-and-pandemics.

From this perspective, active participation by several academic and industrial teams (including by coauthors) in multiple CDC-led forecasting initiatives has led to two broad themes that are important for epidemic modeling. First, modern disease surveillance has grown by leaps and bounds yielding novel data sources that can shed light into happenings real-time. ML and data science techniques on these data sources provide dramatic improvements in short-term forecasting (usually 1-4 weeks ahead). At the same time, as these methods do not explicitly learn mechanistic dynamics, such methods do not provide understanding of how the epidemic will unfold at even longer time horizons, and do not support posing causal and counterfactual questions (e.g., design of countermeasures). Such longer-term forecasting remains the province of mechanistic models that can support scenario-based understanding of epidemic progression (e.g., ”what will happen if I closed schools?”). However, these methods present scalability issues, their calibration is prone to noise, and they have limited capability to ingest multimodal data sources. At the intersection of these two modeling approaches, we have hybrid models that make compartmental models more informed of these data sources 

[Shaman and others2013, Arik and others2020]. However, most of these approaches are still restrictive (very few tunable parameters), are not easily generalizable to new models/data sources or do not aim to incorporate ODE dynamics from first principles (e.g. predict ODE parameters instead of the ODE states).

In this paper, we develop a general framework for incorporating epidemic dynamics from a mechanistic model into a neural framework for forecasting, which enables seamlessly integration of multimodal data, greater representation power, and inclusion of composable neural modules of learned representations. Our focus is to leverage the selective superiorities of both approaches (see Figure 1) to have predictions that are accurate (low error) and well-correlated with even longer-term epidemic trends, than is usually studied in past literature.

Scientific AI [Karniadakis and others2021] is a recent line of work for bridging scientific models (usually represented as differential equations) and ML algorithms. Specifically, the rapidly growing literature in physics-informed neural networks (PINNs) have demonstrated robust solutions in spite of noise [Yang and others2021] and large scalability improvements thanks to the differentiable nature of ODEs and gradient-based learning [Lu and others2021]. We propose to leverage this body of work to incorporate the dynamics of an mechanistic epidemic model into deep neural models. Our goal requires technical innovations as many of the compartments in epi-models are latent (e.g. the actual number of people exposed to the disease) while most work in PINNs has only experimented with all states of the system dynamics being observable. In addition, PINNs are limited to working with the variables that are described in the mechanistic equations which limits their capabilities to ingest data sources.

(a) Method vs epidemic (b) Modeling spectrum
forecasting task and performance
Figure 1: EINN takes the best from both modeling approaches and is suitable for short- and long-term forecasting, and its predictions are well-correlated with the epidemic trends.

Our contributions are as follows: Push the boundaries of data-driven epidemic forecasting: We introduce Epidemiologically-Informed deep Neural Networks (EINN), a new framework to bridge the gap between mechanistic and neural models for epidemiology by allowing for general integration of auxiliary sources and ode models for better performance over longer horizons. We believe this opens new venues for exploring how AI can better complement domain knowledge in traditional epidemiology. Transfer learning via gradient matching of dynamics: We propose a novel method based on matching ODE gradients to transfer knowledge from one neural network dedicated to learning the dynamics from a mechanistic model to another neural network that ingests a variety of features, which avoids integrating the ODEs. Extensive empirical evaluation: We evaluate our method in the challenging task of weekly COVID-19 forecasting in 48 geographical regions and a period of 8 months. Our results showcase that our method can indeed leverage the ‘best of both worlds’ compared to other non-trivial ways of merging such approaches.

2 Related Work

Mechanistic epidemic models. Epidemic mechanistic models [Hethcote2000] like the popular SIR are designed using domain knowledge of the epidemic dynamics. They model causal underpinnings to explain empirically observed variables (e.g., mortality), and ODE-based models have been a workhorse of epidemiology since the late 18th century [Marathe and others2013].

ML epidemic models. There are several successful applications of machine learning to nowcasting and short-term forecasting [Rodríguez and others2021a, Osthus and others2019, Brooks and others2018, Adhikari and others2019] which led them to be often ranked among the top performing models in these tasks [Cramer and others2021, Reich and others2019]. Some of the recent deep learning innovations include advances in incorporting multi-view and multimodal data [Kamarthi and others2021a], spatial correlations [Deng and others2020, Jin and others2021], transfer learning for domain adaptation [Rodríguez and others2021b] and non-parametric approaches [Kamarthi and others2021b, Zimmer and others2020].

Hybrid epidemic models.

There has been recent work aimed at integrating the advantages of mechanistic models and ML approaches. Some lines of work use statistical techniques to estimate the mechanistic parameters (e.g. transmission rate) 

[Arik and others2020], learn from simulation-generated data [Wang and others2019], or use the ODEs as regularization [Gao and others2021]. The most prominent in this line of research is the work by [Shaman and others2013] that integrated search volume data into an SEIR ODE model via the principle of data assimilation [Wang and others2000]. However developing such data assimilation approaches requires a lot of domain knowledge. In summary, most of these hybrid models, in contrast to EINN, cannot be easily generalized to other (possibly more complex) models like for COVID and in presence of novel data sources.

Theory-informed machine learning. This line of work aims to enable ML exploit knowledge of the underlying mechanisms and/or causal relations [Karpatne and others2017]. PINNs have been used for forward and inverse problems with ODEs in a variety of domains including computational biology where data is noisy and samples are irregular [Yazdani and others2020]. However, incorporating exogeneous variables to this framework and working with partially observable systems are largely unexplored problems [Cai and others2021, Wang and others2021]. Our approach EINN is a new class of PINN that directly addresses these two limitations in context of epidemiology. A remotely related but popular line of work for learning dynamical systems is neural ODEs [Chen and others2018] but they cannot incorporate domain ODEs.

3 Buidling Blocks

In this section we describe the basic building blocks that we use in this paper.

3.1 SEIRM mechanistic model

The SEIRM model is a popular compartmental ODE model in epidemiology consisting of the Susceptible, Exposed, Infected, Recovered, and Mortality compartments. It is parameterized by four variables , where is the infectivity rate, is the mean latent period for the disease, is the mean infectious period, and is the mortality rate. Due to COVID-19’s prolonged incubation period, the SEIRM has been broadly used in modeling its progression [Wu and others2020, Morozova and others2021]. It has also been used by the CDC in modeling transmission of Ebola [Gaffey and others2018]. To capture the evolving nature of the dynamics and spread of COVID-19 (e.g. consider the multiple variant waves), we leverage the dynamic version of the SERIM model, where the parameters governing the disease progression themselves evolve over time. In such a setting, the dynamics is governed by the set of parameters at the given time-stamp . Let be the values taken by the states at time

. Then, the Ordinary Differential Equations (ODEs) describing the SEIRM model is given by



Typically ODE-based epidemiological models are calibrated using simulation optimization or Bayesian techniques [Venkatramanan and others2018]. The evolving nature of the models coupled with limited observation, noise and spikes make calibration harder. Recently, physics informed neural networks were shown to be helpful for calibration for ODEs in general [Yazdani and others2020].

3.2 RNN architecture

The Recurrent Neural Network (RNN) encoder has been extensively used in neural epidemic forecasting as a central building block  

[Adhikari and others2019, Kamarthi and others2021b, Rodríguez and others2021b, Wang and others2019]. Here, we introduce the base architecture of our model.

Informally, at prediction time we are given a multivariate time series of features/signals and we are tasked to predict steps ahead in the future. We encode the feature time series until

by passing it through a Gated Recurrent Unit (GRU) 

[Cho and others2014] to obtain a condensed representation for each time step:


where is the hidden state of the GRU for time step . To capture long-term relations and prevent over-emphasis on last terms of sequence we use self-attention layer:


where Self-Atten [Vaswani and others2017] involves passing the embeddings into linear layers to extract meaningful similarities before normalizing the similarities using Softmax. Then, we use the attention weights to combine the latent representations and obtain a single embedding representing the time series of features from to : and for the final predictions, we pass over a feedforward network: .

4 Our Approach

4.1 Problem formulation

As mentioned earlier, we aim on harnessing the strengths of both machine learning/deep learning approaches (which have been very successful in short-term forecasting) and mechanistic models (which are useful for long-term trend projections). Hence, our problem is one of merging neural models with mechanistic model dynamics while maintaining benefits from both the techniques. To capture this specific intention, we modify traditional forecasting problems [Adhikari and others2019, Rodríguez and others2021a, Rodríguez and others2021b] in the following manner:

Epi-Informed Neural Forecasting problem Given: A base epidemiological model mathematically represented as a set of ODEs (for example, see the SEIRM model in Section 3.1). A base RNN (See Section 3.2). Data: an observed multivariate time series of COVID-related signals and corresponding values for the forecasting target (new COVID-associated deaths) , where is the first day of the outbreak and is the current date. Predict: next values of the forecasting target, i.e. (here is the size of the forecasting window), such that predictions are accurate and well-correlated with the trends of the epidemic curve.

We want to focus on pushing the prediction horizon which has not been studied much: most prior work in this space focuses on short-term forecasting [Rodríguez and others2021b] where as opposed to in this paper. Note that just minimizing the prediction error as measured by root mean squared error (RMSE) is not enough to ensure the long term trends between forecasts and the actual values are similar. Hence, we focus on matching long term RMSE (accuracy) as well as trends (correlation).

4.2 Motivation and Overview

To tackle the Epi-Informed Neural Forecasting problem, one can easily conjure ‘naïve’ approaches. A simple approach is to calibrate the given mechanistic model with the observed target variables, deaths and incidence, and train the base RNN using the generated curve. Similarly, one could also use the ODEs to regularize the neural predictions and could even train an ensemble with nueral network’s and ODE-model’s predictions. However, as we show in the experiments, while these simple approaches can maintain the predictions of the base RNN model, they are not able to generate well-correlated predictions.

Our main idea to overcome the shortcomings of the simple models discussed above is to augment the base encoder-decoder RNN with the latent dynamics inferred by the mechanistic model. Recently, several works [Yazdani and others2020, Karniadakis and others2021] in the field of physics informed Neural Networks and Systems Biology have tried to tie neural networks and mechanistic ODEs for a variety of tasks including ODE parameter inference and forecasting. This line of work assumes that the neural network is a function of single variable and ODE system is in the form describing the rate of change of some function of . This allows for easy computation of the gradient (often implemented using Automatic Differentiation–autograd), which in turn, makes it possible to train the neural network while minimizing the difference between two gradients, e.g. using the term . However, note that our problem asks us to learn a mapping from a multivariate time series of COVID-related signals to the the targets . Hence, the strategies employed by existing works do not easily extend to our setting. Predicting ODEs parameters and forward integrating, as in [Arik and others2020], is also not a good fit here as it does not ensure the similarity between the predictions made by the ODEs and the nerual network.

4.3 Time module: learning latent time-varying dynamics

The first major component of EINN is the time module. It interfaces with the set of ODEs describing the epidemic progression to learn the lantent dynamics. We follow PINNs and use a neural network to solve ODE by jointly minimizing the observational error and the residual between gradient given by the ODE and the gradient of the neural network with respect to the time inputs (computed via automatic differentiation). Our time module is a feed-forward network that ingests time as input and outputs , but except for cumulative deaths , where we predict new deaths to match the reported data.

where are the learned ODE parameters for time . We minimize the ODE loss (unsupervised loss) while fitting the observed data (supervised loss).


where is new deaths predicted by the time module.

Constraining the optimization via epi domain knowledge. In contrast to the setting in most prior work in PINNs where most states of the system dynamics are observed, most states in our SEIRM model are latent. Learning in such a scenario is extremely challenging. We alleviate this by infusing additional epidemiological domain knowledge in the form of monotonocity constraints. We adapt monotonicity losses from [Muralidhar and others2018] to our setting. They proposed to penalize consecutive predictions if they are not monotonic in the required direction. Note that the difference between consecutive predictions are discrete approximation to the derivatives. Here, we generalize this loss to continuous derivatives (this can be achieved by taking the limit ). For our specific ODE described above, we know that the Susceptible state monotonically decreases and the Recovered state monotonically increases (these are easy to derive for other epidemiological models as well). Now, we add a penalty when is positive and when is negative.


where is the rectified linear function. Note that and are part of , which is the output of the time module; thus, and are computed via autograd.

Coping with spectral bias in neural networks. One of the central issue in fitting PINNs is the spectral bias of neural networks, which is the tendency of neural networks to fit low frequency signals  [Wang and others2021]. To overcome this spectral bias, usually the neural networks are given more flexibility to fit high frequency systems. Here, we adopted Gaussian Random Fourier feature mappings  [Tancik and others2020]: , where each entry in is sampled from , where

is a hyperparameter.

Handling time-varying ODE parameters. As mentioned earlier, our ODE model is time varying, therefore we have to learn mechanistic parameters for each time step, which increases the difficulty of the optimization. To make this more tractable, we propose a consistency loss between consecutive parameters.


4.4 Feature module: connecting features to epidemic dynamics via gradient matching

The second component of our model is the feature module, which maps the multivariate time-series of COVID-19 related signals to the forecasting targets . Here, we want to ensure that the predictions made by the feature module are consistent with the ones given by the ODE model. Hence, we want the feature module to capture the latent dynamics learned by time module. By succeeding on this, we will connect features to the epidemic dynamics of our SEIRM. We can formalize our goal as follows:

where are the ODE states predicted by the feature module and are the same ODE parameters used by in the time module.

Matching the ODE gradient. We cannot directly calculate via autograd from the inputs as we did for the time module because our base RNN ingests features. We propose to use internal representations (embeddings) so that we can approximate the gradient to an expression that can be computed via autograd directly. Let and

be embeddings for the time module and feature module. Then, by using the chain rule, we propose to approximate the gradient of

assuming and have our gradient trick:


where can be calculated in the feature module using autograd because is the only variable that is needed to compute . Similarly, is the only input needed for computing , thus, we can use autograd to compute . To make this approximation valid, we have to make these embeddings and similar. We do this with the following loss:


This derivation allow us to make the feature module to learn the gradients learned by the time module by minimizing and ODE loss for the feature module:


Aligning with data and time module outputs. Matching the ODE gradient is not enough to ensure the dynamics will be transferred. We have to make sure that the feature module outputs are aligned with data and with the ODE dynamics as found by the time module. For fitting the data, we define data loss in a manner similar to the time module:


where is the new deaths prediction of the feature module. To align the time and feature modules, we use knowledge distillation (KD) [Ba and others2014], a popular transfer learning method. We impose our KD loss on the outputs of these two modules:


Note that our time module is able to predict for any given time but our base RNN makes prediction for one target in the future. To align these two, we make our feature module to make joint prediction using a decoder GRU which takes as the initial hidden state and roll the GRU forward for every prediction step ahead taking time as input. Thus, our decoder equations will be and our final predictions .

4.5 Model training and inference

Training. Step 1: During the first training step, our goal is to make so that later we can use the gradient approximation stated in Equation (8) so that we can then match the gradient of the ODE. For this, we we can train all parameters jointly with all the losses except for the feature ODE loss . Step 2: Once is small, we can train all losses together, however, it might be unstable and may start to increasing which in turn invalidates our gradient matching trick and makes the minimization of misleading. We found the training is more stable when freezing all previous layers to in the time module and all previous layers to in the feature module. In this case, we only focus the learning in the last layers, therefore, they should have enough representation power for this task.

Inference. At inference, although we have predictions from both time and feature modules, we solely utilize the feature module predictions as it ingests features and we want to emphasize the utility of inserting dynamics in ML models.

5 Experiments

5.1 Setup

All the results are for forecasting COVID-19 mortality in the US up to 8-weeks ahead in the future. Our code is available for research purposes222https://github.com/AdityaLab/EINNs. We evaluate in 47 states excluding 3 states where the SEIRM mechanistic model had difficulties fitting due to sparsity of death counts (specifically Alaska, Montana, and Wyoming). Our evaluation period is 8 months from Sept. 2020 to March 2021 (specifically CDC epidemic weeks 202036 to 202109) which includes the complete Delta wave in the US, and we make 8-weeks ahead predictions for every two weeks in this period. We used June 2020 to Aug. 2020 to tune our models. For each forecasting week, all models are trained with historical data available until that week.

Short-term (1-4 wks) Long-term (5-8 wks) Trend correlation
RNN 1.09 0.50 0.86 1.19 0.53 0.96 0.08
SEIRM 2.35 1.13 1.36 7.14 2.99 3.11 0.53
Generation 0.79 0.35 0.60 0.93 0.40 0.74 -0.01
Regularization 1.05 0.48 0.81 1.19 0.53 0.97 0.09
Ensembling 0.91 0.41 0.68 0.93 0.40 0.69 -0.01
PINN (time module standalone) 0.84 0.38 0.64 0.93 0.40 0.72 0.24
EINN-NoGradMatching 0.82 0.36 0.61 0.89 0.38 0.68 0.04
EINN 0.54 0.24 0.38 0.85 0.37 0.66 0.46
Table 1: Results for 47 states + US National over a period of 8 months (5376 predictions per model), including the Delta wave. EINN is the only one consistently among the best performing for all (lower NRMSE1, NRMSE2 and ND is better; higher Pearson correlation is better).

Metrics. Our focus is in predictions that are accurate and well-correlated the epidemic trends, thus we measure two aspects of our predictive performance: error and trend correlation. Error metrics: As previous work in this area [Adhikari and others2019], we adopt metrics based on root mean squared error and absolute deviation. Because the number of deaths largely vary across regions, we use normalize versions of popular error metrics so that we can aggregate performance over multiple regions. We use two different versions of Normalized Root Mean Squared Error (NRMSE) and Normal Deviation (ND) following [Roy and others2021, Remy and others2021]. For all of these metrics, we calculate them at short-term forecasting (1-4 weeks) and long-term forecasting (5-8 weeks) and calculate their mean value. Correlation metric: Following [Deng and others2020], we use Pearson correlation and use their median across weeks.

Data. We collected important publicly available333Data links: apple.com/covid19/mobility; google.com/covid19/mobility; coronavirus.jhu.edu; healthdata.gov; delphi.cmu.edu; gis.cdc.gov/grasp/COVIDNet/COVID19_3.html signals from a variety of trusted sources that are relevant to COVID-19 forecasting. The features in this dataset are mobility from Google and Apple, social media surveys from Facebook, hospitalization data from the U.S. Department of Health & Human Services and CDC, and cases and mortality from Johns Hopkins University. In total, we collected 13 features.

Baselines. As we are the first to pose the Epi-Informed Neural Forecasting problem, we do not have off-the-shelf baselines. Instead, we adopt different ways to incorporate ODE dynamics into a neural model that have been explored in literature [Dash and others2022]. Generation: Similar to [Wang and others2019, Sanchez-Gonzalez and others2020], the NN learns directly from data generated by the analytical solution of SEIRM. Regularization: Similar to [Gao and others2021, Gaw and others2019], the NN predicts both the ODE states and the ODE parameters. Then uses the ODE parameters to regularize the states via a loss based on the SEIRM equations. Ensembling: As per [Adiga and others2021, Yamana and others2017], combines predictions from RNN and SEIRM via a NN that outputs final predictions. Note that, unlike Generation and Ensembling, our method does not need integrating the ODE.

5.2 Results

Q1: Leveraging advantages of both mechanistic models and neural models. Our RNN has a lower error in short- and long-term forecasting than the SEIRM, but its predictions are much less correlated with epidemic trends (see comprehensive results in Table 1). By integrating ODE and neural models, EINN is capable of taking advantage from both. Comparing with the ODE, our Pearson correlation is similar but our predictions are much more accurate (up to 77% less error). Indeed, EINN not only improves RNN correlation by 475% but also its accuracy up to 55% thanks to the incorporation of short and long-term dynamics.

Q2: Benefits over other ways to incorporate epidemic dynamics into neural models. EINN has the lowest error and best correlation in comparison with other existing ways to incorporate epidemic dynamics to neural networks. While we can see that all of them have better short- and long-term error than the RNN (especially Generation and Ensembling), their correlation is worse or the same. Instead, EINN across 48 different regions (5376 predictions per model) shows significant improvements in both accuracy and correlation by a large difference.

Q3: Ablation: time module PINN and gradient matching. We perform ablation studies to understand what are the contributions of the main components of our model. First, we analyze our time module trained standalone, i.e., being trained without the feature module with losses in Equations (8-12) (PINN in Table 1). We can see that, although our time module PINN directly interacts with the ODE and their behavior will be coupled during training, they have different behavior in test. In fact, this points to the need that we need features to be able to extract representations that generalize in test.

Second, we assess the contribution of our gradient matching trick (EINN-NoGradMatching), for which we train with all losses except for the ones in Equations (12) and (13). In this scenario where only helps to transfer the dynamics, we can see that it is a less effective way.

Q4: Case study in Delta wave: US National and New York state. In Figure 2, we exemplify the advantages of EINN (in red) in both accuracy and correlation. The RNN (blue) is accurate in short term forecasting, but we can see how the long-term predictions (see US National when the epidemic curve is declining). In addition, ODE predictions are well-correlated but may incur in large errors. EINN takes the best from these two.

Figure 2: Example of the advantages of our method in US National and NY predictions. EINN (red) has better accuracy than SEIRM (green) and RNN (blue) and has similar correlation as the SEIRM.

Q5: Sensitivity to hyperpameters. We found most hyperparameters are not sensitive. See Appendix for more details.

6 Conclusions

We introduced a novel approach EINN to incorporate mechanistic dynamics (via the SEIRM model) into neural models (using a RNN style base model). We show the effectiveness of a principled method to transfer relevant knowledge via gradient matching of the ODE equations, without integrating (forward pass) the model. Through extensive experiments over 48 regions we also show the usefulness of EINN in COVID-19 forecasting and also the importance of our various design choices. Overall, we believe this work opens up new avenues for leveraging epi domain knowledge into AI models for better decision making. Connecting complex mechanistic models to neural networks also enables us to have learned representations useful for other tasks downstream like what-if predictions, which would be worth exploring. In addition investigating more complex epidemiological models (like network based agent models) would be fruitful.

Acknowledgments: This work was supported in part by the NSF (Expeditions CCF-1918770, CAREER IIS-2028586, RAPID IIS-2027862, Medium IIS-1955883, Medium IIS-2106961, CCF-2115126), CDC MInD program, ORNL, faculty research award from Facebook and funds/computing resources from Georgia Tech. BA was supported by CDC-MIND U01CK000594 and start-up funds from University of Iowa. NR was supported by US NSF grants Expeditions CCF-1918770, NRT DGE-1545362, and OAC-1835660.


  • [Adhikari and others2019] Bijaya Adhikari et al. EpiDeep: Exploiting Embeddings for Epidemic Forecasting. In KDD, 2019.
  • [Adiga and others2021] Aniruddha Adiga et al. All Models Are Useful: Bayesian Ensembling for Robust High Resolution COVID-19 Forecasting. preprint, Epidemiology, 2021.
  • [Arik and others2020] Sercan O. Arik et al. Interpretable Sequence Learning for COVID-19 Forecasting. arXiv, 2020.
  • [Ba and others2014] Jimmy Ba et al. Do deep nets really need to be deep? ANIPS, 2014.
  • [Brooks and others2018] Logan C. Brooks et al. Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions. PLOS Comp. Biology, 2018.
  • [Cai and others2021] Shengze Cai et al. Deepm&mnet: Inferring the electroconvection multiphysics fields based on operator approximation by neural networks. Journal of Comp. Physics, 2021.
  • [Chen and others2018] Ricky TQ Chen et al. Neural ordinary differential equations. In NeurIPS, 2018.
  • [Cho and others2014] Kyunghyun Cho et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation. 2014.
  • [Cramer and others2021] Estee Y Cramer et al. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the US. medRxiv, 2021.
  • [Dash and others2022] Tirtharaj Dash et al. A review of some techniques for inclusion of domain-knowledge into deep neural networks. Scientific Reports, 2022.
  • [Deng and others2020] Songgaojun Deng et al. Cola-GNN: Cross-location Attention based Graph Neural Networks for Long-term ILI Prediction. In CIKM, 2020.
  • [Gaffey and others2018] Robert H Gaffey et al. Application of the cdc ebolaresponse modeling tool to disease predictions. Epidemics, 2018.
  • [Gao and others2021] Junyi Gao et al. STAN: spatio-temporal attention network for pandemic prediction using real-world evidence. JAMA, 2021.
  • [Gaw and others2019] Nathan Gaw et al. Integration of machine learning and mechanistic models accurately predicts variation in cell density of glioblastoma using multiparametric mri. Scientific reports, 2019.
  • [Hethcote2000] Herbert W. Hethcote. The Mathematics of Infectious Diseases. SIAM Review, 2000.
  • [Jin and others2021] Xiaoyong Jin et al.

    Inter-Series Attention Model for COVID-19 Forecasting.

    In SDM. SIAM, 2021.
  • [Kamarthi and others2021a] Harshavardhan Kamarthi et al. CAMul: Calibrated and Accurate Multi-view Time-Series Forecasting. arXiv, 2021.
  • [Kamarthi and others2021b] Harshavardhan Kamarthi et al. When in Doubt: Neural Non-Parametric Uncertainty Quantification for Epidemic Forecasting. In NIPS, 2021.
  • [Kamarthi and others2022] Harshavardhan Kamarthi et al. Back2future: Leveraging backfill dynamics for improving real-time predictions in future. ICLR, 2022.
  • [Karniadakis and others2021] George Karniadakis et al. Physics-informed machine learning. Nature Reviews Physics, 2021.
  • [Karpatne and others2017] Anuj Karpatne et al. Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data. TKDE, 2017.
  • [Liu and others2021] Xiao Liu et al.

    Self-supervised learning: Generative or contrastive.

    TKDE, 2021.
  • [Lu and others2021] Lu Lu et al. Deepxde: A deep learning library for solving differential equations. SIAM Review, 2021.
  • [Marathe and others2013] Madhav Marathe et al. Comp. epidemiology. Comm. of the ACM, 2013.
  • [Morozova and others2021] Olga Morozova et al. One year of modeling and forecasting covid-19 transmission to support policymakers in connecticut. Scientific reports, 2021.
  • [Muralidhar and others2018] Nikhil Muralidhar et al. Incorporating Prior Domain Knowledge into Deep Neural Networks. In ICBD, 2018.
  • [Osthus and others2019] Dave Osthus et al. Dynamic Bayesian Influenza Forecasting in the United States with Hierarchical Discrepancy (with Discussion). Bayesian Analysis, 2019.
  • [Reich and others2019] Nicholas G. Reich et al. A collaborative multiyear, multimodel assessment of seasonal influenza forecasting in the United States. PNAS, 2019.
  • [Remy and others2021] Sekou L Remy et al. Overcoming digital gravity when using ai in public health decisions. arXiv, 2021.
  • [Rodríguez and others2021a] Alexander Rodríguez et al. DeepCOVID: An Operational Deep Learning-driven Framework for Explainable Real-time COVID-19 Forecasting. In AAAI, 2021.
  • [Rodríguez and others2021b] Alexander Rodríguez et al. Steering a Historical Disease Forecasting Model Under a Pandemic: Case of Flu and COVID-19. In AAAI, 2021.
  • [Roy and others2021] Padmaksha Roy et al. Deep diffusion-based forecasting of covid-19 by incorporating network-level mobility information. In CASSNAM, 2021.
  • [Sanchez-Gonzalez and others2020] Sanchez-Gonzalez et al. Learning to simulate complex physics with graph networks. In ICLR. PMLR, 2020.
  • [Shaman and others2013] Jeffrey Shaman et al. Real-time influenza forecasts during the 2012-2013 season. Nature Comm., 2013.
  • [Tancik and others2020] Matthew Tancik et al. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains. In NIPS, 2020.
  • [Vaswani and others2017] Ashish Vaswani et al. Attention is all you need. NIPS, 2017.
  • [Venkatramanan and others2018] Srinivasan Venkatramanan et al. Using data-driven agent-based models for forecasting emerging infectious diseases. Epidemics, 2018.
  • [Wang and others2000] Bin Wang et al. Data assimilation and its applications. PNAS, 2000.
  • [Wang and others2019] Lijing Wang et al. DEFSI: Deep Learning Based Epidemic Forecasting with Synthetic Information. AAAI, 2019.
  • [Wang and others2021] Sifan Wang et al. Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM, 2021.
  • [Wu and others2020] Joseph T Wu et al. Nowcasting and forecasting the potential domestic and international spread of the 2019-ncov outbreak originating in wuhan, china: a modelling study. The Lancet, 2020.
  • [Yamana and others2017] Teresa K Yamana et al. Individual versus superensemble forecasts of seasonal influenza outbreaks in the united states. PLoS Comp. Bio, 2017.
  • [Yang and others2021] Liu Yang et al. B-pinns: Bayesian physics-informed neural networks for forward and inverse pde problems with noisy data. Journal of Comp. Physics, 2021.
  • [Yazdani and others2020] Alireza Yazdani et al. Systems biology informed deep learning for inferring parameters and hidden dynamics. PLOS Comp. Bio, 2020.
  • [Zimmer and others2020] Christoph Zimmer et al. Influenza Forecasting Framework based on Gaussian Processes. In ICML, 2020.


Code and data are publicly available444https://github.com/AdityaLab/EINNs. Please refer to the README in code repository for more details.

Short-term (1-4 wks) Long-term (5-8 wks) Trend correlation
Hyperparameter Value NRMSE1 NRMSE2 ND NRMSE1 NRMSE2 ND Pearson
1 0.41 0.50 0.35 0.52 0.73 0.48 0.68
10 0.38 0.46 0.32 0.51 0.70 0.47 0.72
100 0.34 0.39 0.29 0.57 0.73 0.50 0.55
1 0.38 0.46 0.32 0.51 0.70 0.47 0.72
10 0.39 0.46 0.33 0.53 0.70 0.49 0.75
100 0.39 0.46 0.34 0.53 0.71 0.49 0.76
Table 2: Hyperparameter sensitivity for EINN over states California, Georgia, Illinois, Texas, and US National.

Appendix A Experimental setup (extra details)

Computational setup.

All experiments were conducted using a 4 Xeon E7-4850 CPU with 512GB of 1066 Mhz main memory and 4 GPUs Tesla V100 DGXS 32GB. Our method implemented in PyTorch (implementation details in the appendix) trains on GPU in about 30 mins for one predictive task. Inference is takes only a few seconds.

Real-time forecasting. We follow the literature on evaluating epidemic forecasting methodologies [Shaman and others2013, Kamarthi and others2021a, Adhikari and others2019] and use the real-time forecasting setup. We simulate real-time forecasting by making models train only using data available until each of the prediction weeks and make predictions for 1 to 8 weeks ahead in the future. Data revisions in public health data are large and may affect evaluation and conclusions [Kamarthi and others2022, Cramer and others2021], therefore, we utilize fully revised data following previous papers on methodological advances [Adhikari and others2019, Rodríguez and others2021b].

Evaluation details. Some models may make daily predictions while others weekly predictions. We follow CDC evaluation papers [Cramer and others2021, Reich and others2019] and convert all forecasts to weekly.

Appendix B Implementation details

b.1 Data Preprocessing

Feature scaling. Time-series of exogenous features can have wide range of values (e.g., number of confirmed cases vs percentage of people with COVID-like symptoms in social media surveys) Therefore, we scale all signals per each region for which we use standard scaling (normalization).

Time series for training. Although we may have long time series, we found our RNN works better with no chunking. As during training we have variable-length input sequences, we use a mask that is utilized when calculating the attention scores of Equation (3). As we follow the real-time forecasting setup, at inference time we use the complete input sequence thus we do not need a mask.

b.2 Architecture and hyperparameters

We describe in detail the hyperparameters for EINN used for our experiments. As mentioned in Section 5, we used data from June 2020 to Aug. 2020 for model design and hyperparameter tuning.

Time module:

It is a feedforward network with input layer layers 40x40x40x20 followed by output layers 20x40x40x5, and activation function

. Note that the input to this module is time, which is of dimension 1, but this is immediately transformed to 40 different signals via Gaussian Random Fourier feature mapping which then enter to the neural network. Between the input and output layers, we have our embeddings , which are of smaller size (20) than the other hidden layers. We make this selection because we want to make embeddings and to be as close as possible, and this is hard to achieve when we deal with high-dimensional embeddings as it has been noted in the Contrastive Learning literature [Liu and others2021]. Passing to the output layer gives us . With respect to the learnable ODE parameters, we a transformation following [Yazdani and others2020]. This ensures that the actual parameter values will be within their corresponding domain (0-1). In our experience, we found that initializing the ODE parameters with the ones found by the analytical solution works best.

Feature module: As encoder, we used a bi-directional GRU with 2 layers and hidden states of dimension 32. As decoder, we use another bi-directional GRU with 1 layer and hidden states of dimension 32 and an feedforward output layer of 32x20 to obtain . To encourage transfer learning, we utilize shared layers between the time and feature modules. Therefore, the embedding is passed to the output layer of the time module to obtain .

Loss weights: All our results use loss weight of 1 except for the following losses. For our ODE loss we use weight loss of 10, and we weight the same in our monotonicity loss . Our parameter consistency loss is weighted with . Finally, we have a helper loss for the time module that ingests data from the analytical solution of the ODE, to which we put a weight of

b.3 Hyperparameter sensitivity

Overall, we found that most hyperparameters are not sensitive. The most sensitive ones are the weights for ODE loss for time module and feature module , and embedding and output losses . To illustrate this, we vary the loss weight which is applied to both and and another loss weight which is applied to both and .

We analyze hyperparameters sensitivity on five geographical regions in the uptrend of the Delta wave (over 2 months, specifically epidemic weeks 202048 to 202101) which is one of the most difficult to predict due to the unprecedented infectiousness of the Delta variant and shift in human behavior. In Table 2, we can see that EINN performance is stable across different values of hyperparameters It is worth noting that it tuning is important to have a correct balance on these losses as increasing one weight on may improve error or correlation but degrade the one facet. As noted in the literature [Wang and others2021], this is common when working with theory-based constrains and how to best proceed remains an open problem.

Appendix C Evaluation metrics

As noted in Section 5, we used two different versions of Normalized Root Mean Squared Error (NRMSE) and Normal Deviation (ND) following [Roy and others2021, Remy and others2021]. Given prediction at prediction week for -weeks ahead in the future, and the corresponding ground truth value , our error metrics are the following:

where is the number of predictions weeks and is number of weeks ahead in the future; for our setup we have and , which makes 112 different predictions for model for a single region and makes 5376 predictions per model over all regions.

Regarding to correlation, we use Pearson correlation over the sequence of 8 predictions in the future as per [Deng and others2020]:

where and are the mean values of the ground truth and the model’s predictions, respectively.