# Group Sparse Bayesian Learning for Active Surveillance on Epidemic Dynamics

Predicting epidemic dynamics is of great value in understanding and controlling diffusion processes, such as infectious disease spread and information propagation. This task is intractable, especially when surveillance resources are very limited. To address the challenge, we study the problem of active surveillance, i.e., how to identify a small portion of system components as sentinels to effect monitoring, such that the epidemic dynamics of an entire system can be readily predicted from the partial data collected by such sentinels. We propose a novel measure, the gamma value, to identify the sentinels by modeling a sentinel network with row sparsity structure. We design a flexible group sparse Bayesian learning algorithm to mine the sentinel network suitable for handling both linear and non-linear dynamical systems by using the expectation maximization method and variational approximation. The efficacy of the proposed algorithm is theoretically analyzed and empirically validated using both synthetic and real-world data.

## Authors

• 2 publications
• 38 publications
• 6 publications
• 1 publication
• ### Nonparametric Bayesian Sparse Graph Linear Dynamical Systems

A nonparametric Bayesian sparse graph linear dynamical system (SGLDS) is...
02/21/2018 ∙ by Rahi Kalantari, et al. ∙ 0

• ### Sparse learning of stochastic dynamic equations

With the rapid increase of available data for complex systems, there is ...
12/06/2017 ∙ by Lorenzo Boninsegna, et al. ∙ 0

• ### Tree-Structured Recurrent Switching Linear Dynamical Systems for Multi-Scale Modeling

Many real-world systems studied are governed by complex, nonlinear dynam...
11/29/2018 ∙ by Josue Nassar, et al. ∙ 8

• ### A Hierarchical Bayesian Linear Regression Model with Local Features for Stochastic Dynamics Approximation

One of the challenges in model-based control of stochastic dynamical sys...
07/11/2018 ∙ by Behnoosh Parsa, et al. ∙ 0

• ### Efficient Amortised Bayesian Inference for Hierarchical and Nonlinear Dynamical Systems

We introduce a flexible, scalable Bayesian inference framework for nonli...
05/28/2019 ∙ by Geoffrey Roeder, et al. ∙ 0

• ### Crowdsourcing real-time viral disease and pest information. A case of nation-wide cassava disease surveillance in a developing country

In most developing countries, a huge proportion of the national food bas...
08/09/2019 ∙ by Daniel Mutembesa, et al. ∙ 0

• ### An Improved Bayesian Framework for Quadrature of Constrained Integrands

Quadrature is the problem of estimating intractable integrals, a problem...
02/13/2018 ∙ by Henry Chai, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1   Introduction

Predicting epidemic dynamics is of great value in understanding and controlling diffusion processes. Diffusion phenomena, such as infectious disease spread and information propagation, exist widely in the real world. By the notion of a dynamical system, which is a powerful tool for characterizing dynamics [Brunton, Proctor, and Kutz2016]

, the task of epidemic dynamics prediction is to estimate ensuing system states using given current states. Therefore, the foundation of this task is surveillance, which is to monitor and report the current system states in a timely manner.

However, in practice, it is often very challenging to monitor the overall components in a system because diffusion phenomena often cross over very large spatiotemporal ranges, e.g., spreading of infectious diseases in a country [Polgreen et al.2009], air contaminant diffusion in a large city [Zheng, Liu, and Hsieh2013], and hot topics/meme forwarding on social media [Chen et al.2013]. For large-scale diffusion phenomena, timely and all-around monitoring is hard and infeasible, especially when available surveillance resources are very limited.

Owing to a lack of systematic deployment of already limited surveillance resources, disease surveillance has long suffered from low reporting rates, biased sampling, and lengthy reporting time-lags [Gerardo-Giorda et al.2013]. For instance, Tengchong city, a malaria endemic region in China, has 18 towns (consisting of 221 villages), 167,964 households, and 658,207 residents that are distributed in a mountainous area of 5,845 square kilometers. From 2005 to 2011 in Tengchong, 7,835 confirmed malaria cases were reported, but the Tengchong Centers for Disease Control (CDC), the local disease surveillance team, could only afford to have a few staff members conducting the time-consuming case surveys.

Active surveillance is a promising strategy to address the challenge of limited surveillance resources, in which epidemic dynamics is predicted by proactively monitoring a relatively small number of sentinel components, whose state data are collected to achieve a good trade-off between prediction accuracy and surveillance cost. The key to implementing active surveillance is to determine, in a dynamical system, which components are important for dynamics prediction and how to identify them. This is a non-trivial task because the interaction structure among components, which characterizes the dynamics mechanism, is usually hidden and heterogenous, such as the social contact network in charge of disease spread [Yang et al.2017].

Here, we address this challenge by proposing a novel importance measure, the value, to determine how important a component is to predicting epidemic dynamics. Based on the measure, we develop a backward-selection algorithm designated the sentinel network mining algorithm (SNMA for short) to mine a sentinel network. The sentinel network encodes the influence relationship from sentinel components to the overall system components, which is a row sparse network that contains only the influential links emitted from the sentinels. With the discovered sentinel network, one can predict the future overall system dynamics by only monitoring and feeding the sentinels’ states into a system function.

Different from existing works, we model the task of sentinel network mining as a group sparse learning problem and propose an effective and flexible Bayesian learning algorithm for various dynamical systems, including the most widely used linear continuous system and logistic discrete system. The expectation-maximization (EM) and variational approximation methods are employed to infer the posterior distribution of the sentinel network. Particularly, we solved the scalability problem by means of proposing more efficient multiplication and inverse operations on diagonal-block metrics. Validations and comparisons were performed on both synthetic and real-world data, which show that the proposed method outperforms existing methods.

We summarize the main contributions of this paper as follows: (1) We propose a novel measure, the value, to identify sentinel components by modeling the sentinel network with row sparsity structure; (2) We design an effective and flexible group sparse Bayesian learning algorithm to discover the sentinel network; (3) We solve the scalability and generality problem of this algorithm to a certain degree; (4)We develop a comprehensive framework for active surveillance and validate it by performing extensive experiments on both synthetic and real-world data.

## 2   Related works

In the literature, most of the optimal sensor placement studies identify the sentinels’ location via the framework of the budgeted maximum coverage problem (BMCP) [Khuller, Moss, and Naor1999], which aims to maximize a specific objective by finding a set of components with the minimum budget (the budget is often defined as the set size). Examples include the timely detection of contaminated water [Krause et al.2008] and early detection of the outbreak of Weblogs [Leskovec et al.2007]

. As BMCP is NP-hard, heuristic algorithms like sub-modular maximization are often employed. Note that designing the objective functions of BMCP relies heavily on the known interaction structure among components. When the underlying interaction structure cannot be observed directly, such as the social media network in charge of information diffusion and the social contact network for disease spread, the current methods proposed within the BMCP framework cannot be used

[Gomez Rodriguez, Leskovec, and Krause2010].

In spatial statistics, Gaussian processes (GPs) can effective represent the spatial correlation and uncertainty of the sensed field. Based on GPs, one can adopt general information criteria, typically such as mutual information (GPs-MI), to greedily select sentinels [Krause, Singh, and Guestrin2008, Hoang et al.2014]. In this framework, both the GPs and the information criteria are model-free. They neglect the mechanism of data generation. As a result, the prior knowledge of the phenomena [e.g., the susceptible-infectious-recovered (SIR) model for infectious disease [Dimitrov and Meyers2010]] is difficult to integrate into the method. If such available prior knowledge can be adequately incorporated, the performance of learning and prediction will be significantly improved, as shown in our experiments.

In addition to the epidemic dynamics data, some works turn to other types of available data to help identify the sentinels. For instance, socio-economic data is leveraged to estimate the malaria infection risk caused by imported cases, and further to implement an effective active surveillance plan [Yang et al.2014]. Traffic data and environmental data are used to infer real-time air quality, and further determine the best deployment locations of new air contaminant monitoring stations [Hsieh, Lin, and Zheng2015]. It is difficult to reuse these customized methods on other types of active surveillance applications, if the domain data or domain knowledge they require are not available.

## 3   Active surveillance framework

For epidemic dynamics, we now propose the framework of active surveillance. It consists of three main steps.

Step 1: collect epidemic dynamics data in components of interest.

Step 2: mine the sentinel network from the data. In the network, the number of sentinel components (i.e., sentinel nodes) is according to a budget.

Step 3: with the sentinel network, predict future epidemic dynamics of the components based on the data collected from the sentinel components.

The last two steps constitute the foundation of the framework, and we will elaborate them in following sections.

### 3.1   Problem formulation

Consider a diffusion among components in a dynamical system. Let matrix = be the epidemic dynamics during a time window . Specifically, =, where each entry denotes the state of component at time , and it may be a real number (e.g., the number of newly infected cases in the city ) or a Boolean value (e.g., whether a news is posted in the blog ). Let denote the surveillance data collected by sentinel components. Specifically, is equal to when component is a sentinel, and empty otherwise.

Let be the dynamical system function achieving the dynamics prediction. Let matrix denote a sentinel network, a set of key parameters in the system function. It depicts the influential relationship from the sentinels to all components, where each link encodes the effect of sentinel on component by its weight. Thus, is a row sparse matrix only containing links emitted from the sentinels. Now, the active surveillance can be formulated as to predict the future components’ states based on the surveillance data and the sentinel network :

 \footnotesizeDt+1≈^Dt+1=f(Dst;S). (1)

From Eq. 1, two computational issues need to be addressed for the goal of active surveillance:

I) Sentinel identification: How to identify the sentinels from all components and mine the sentinel network according to a given budget from the dynamics ?

II) Sentinel prediction: How to predict the future dynamics from the current surveillance data based on the discovered ?

### 3.2   Sentinel identification

Our basic idea is intuitive: In a dynamical system, the components having little influence on others are unimportant for predicting others’ states, while those exerting a heavy influence on others dominate the system dynamics and should be selected as sentinels. In terms of the sentinel network , one can determine whether a component is important or not by inferring row sparsity. That is, unimportant components are associated with sparse rows in , in which zeros are much more than non-zeros; on the other hand, important ones are associated with non-sparse rows. Figure1 shows an illustration by taking a linear dynamical system as an example.

Based on this idea, we propose a novel index, the value, to measure components’ importance in predicting the epidemic dynamics: a component is important if it is important in both prior and posterior structures of a sentinel network. Specifically, the value is defined as the data-dependent hyper-parameter of the prior of the sentinel network, and also that reflecting the profiles of the posterior of the sentinel network:

 \footnotesize\footnotesizeγi=((μis)Tμis+Tr[Σis])N−1, (2)

where is the value of component . In the following, we elaborate this importance measure from the perspectives of prior and posterior.

#### 3.2.1   Prior perspective

From the basic idea, a sentinel network is desired to have a row sparse structure. Thus, we adopt a zero-mean multivariate Gaussian prior for each row:

 \footnotesizep(Si,⋅|γi)∼N(0,γiIN), i=1,…,N (3)

where vector

denotes the th row of the sentinel network, and

is an identity matrix. By doing so,

controls the diversity of row from a zero vector.

For conciseness, we vectorize matrix , i.e., let vec(), where operator vec() denotes the vectorization of the input matrix by stacking its columns into a column vector. By doing so, vector consists of groups of length , where each group is associated with a row in . Now, the row sparse structure of is equal to the group sparse structure of . In terms of the prior on (Eq. 3), the prior over is

 \footnotesizep(s|γ)∼N(0,Σ0), (4)

where vector = and the covariance matrix is a diagonal matrix:

 \footnotesizeΣ0=[γ1IN⋱γNIN]. (5)

As mentioned before, the links sent from a component reflect its effect on the system dynamics. Now, the links sent from in (i.e., the entries of group in ) are tied together and controlled by a common data-dependent hyper-parameter . This hyper-parameter is a type of automatic relevance determination (ARD) mechanism [MacKay1995]: when is small, the group in is sparse and vice versa; when the group is sparse, the links sent from are very weak in . That is, component is unimportant and can be pruned out without losing much in prediction accuracy.

#### 3.2.2   Posterior perspective

The value also reflects the profile of the posterior of the sentinel network. We model the sentinel network for two kinds of dynamical systems widely used to characterize diffusion phenomena in the real world: a linear continuous system and a logistical discrete system.

Likelihood of linear system. Starting from the linear continuous system, we first give the likelihood function and illustrate the pre-processing of dynamics data. The system function of a linear continuous system is

 \footnotesizeY=XS+V, (6)

where = and = are both extracted from the dynamics data . is the epidemic dynamics later than one time-unit. Specifically, = = and is a Gaussian noise matrix.

For convenience, we further transform Eq. 6 into a vector form, , where vector = vec() , = vec() , and = vec() . The matrix Kron(, ) , where the operator Kron(,) represents the Kronecker product of two input matrices. Now, based on the Gaussian noise assumption, the likelihood of the linear continuous system can be given as

 \footnotesizep(y|Φ,s,λ)∼N(Φs,λI), (7)

where denotes the noise level and is an identity matrix.

Likelihood of logistical system. For the logistical discrete system, the entry in the dynamics data is represented by a Boolean value, or , indicating whether component is “infected” at time

. After the same data pre-processing, we adopt a Bernoulli distribution over each entry of

, i.e., :

 \footnotesizep(yn=1|Φn,s)=σ[Φns], n=1,…,TN, (8)

where

denotes the sigmoid function, and

is the th row of . Then, the likelihood of the dynamics can be written as

 \footnotesizep(y|Φ,s)=∏TNn=1σ[Φns]yn(1−σ[Φns])1−yn. (9)

Now, based on the aforementioned prior and likelihood we have the following conclusion about the posterior:

Theorem 1. For both the linear system and logistical system, the posteriors of the sentinel network are a Gaussian or an approximate Gaussian: :

(1) Linear system: hyper-parameters set ,

 μs=λ−1ΣsΦTy, Σ−1s=Σ−10+λ−1ΦTΦ;

(2) Logistical system: hyper-parameters set ,

 μs=2−1ΣsΦT(2y−1), Σ−1s=Σ−10+ΦTπ(ξ)Φ,

where = denotes variational parameters, and is a diagonal matrix. Specifically, = .

###### Proof.

See the Supporting Information. ∎

Estimation of hyper-parameters. Theorem 1 gives the posteriors of the two kinds of systems (i.e., the Gaussian mean and covariance matrix ). In the following, we study how to iteratively update the hyper-parameters in the posteriors, because they cannot be obtained in a closed form. By treating the as hidden variables, we can employ the EM method to estimate the hyper-parameters set . When the integral in the Q function can be analytically solved (in the linear system), the EM obtains a well-formed solution to estimating the hyper-parameters; otherwise, we apply variational EM to estimate the hyper-parameters by optimizing an approximate low bound of the posterior (in the logistical system). Furthermore, the EM method can guarantee the convergence of the estimation.

Here, we only give the learning rules for the hyper-parameters since the limited space (derivation details can be found in the Supporting Information). In the linear continuous system, the learning rule for the noise parameter is

 \footnotesizeλ←(TN)−1(∥y−Φμs∥22+Tr[ΣsΦTΦ]). (10)

In the logistical discrete system, we update the variational parameter vector by

 \footnotesizeξn←√Φn(Σs+μsμTs)ΦTn, n=1,…,TN. (11)

It is extremely interesting that in both the linear and logistical systems the learning rule for the value is the same,

 \footnotesizeγi←((μis)Tμis+Tr[Σis])N−1, i=1,…,N, (12)

where the vector and matrix denote the th group of and , respectively.

Intuitively, there are two terms that contribute to the value according to its learning rule, Eq .12. The first term is the inner product term , which denotes the sum of the squares of the mean weights of the overall links sent from node in the sentinel network. In other words, it characterizes the influence strength of component . The second term is the trace term

, which denotes the variance of the posterior estimation on links sent from node

; that is to say, it features the influence uncertainty of component . In summary, a larger value corresponds to a node that has many links with large and diverse influences on other nodes.

On the whole, the value is an index by which the importance of a component for predicting the epidemic dynamics of an entire system can be measured. This index integrates the profiles of both the prior and posterior of a sentinel network. For a trivial component in the system, its value will tend to be zero due to the ARD mechanism during Bayesian learning. For an important component, whose value larger than zero, its value could indicate its monitoring priority. A component with a larger value exerts a great influence on other components, and its state is important to monitor for making a prediction.

Based on the value, we propose a backward-selection algorithm called the SNMA, as shown in Algorithm 1. It starts with all components of interest and removes one component at a time until only components are left ( is according the budgets). The component that is removed should be chosen as the one with the minimum value. The form of backward-selection algorithm is theoretically guaranteed to pick a optimal subset of components if the system perturbation is small enough [couvreur2000optimality]. A trade-off between accuracy and budget is practically necessary: the more sentinels are selected, the more predictive accuracy is expected, while more cost is needed.

### 3.3   Sentinel prediction

Once we have obtained the posterior structure of the sentinel network, the epidemic dynamics of the overall system, , can be predicted based on the surveillance data . Let be a new set of surveillance data, where only the values on sentinels’ locations are kept and the rest are empty. As mentioned above, we obtain through the data pre-processing. Then, a predictive distribution over the following system states is given by

 \footnotesizep(y∗|Φs∗,y,Φ)=∫p(y∗|Φs∗,s)p(s|y,Φ)ds. (13)

Linear continuous system. In this case, the integral in Eq. 13 is a Gaussian convolution (refer to the proof of Theorem 1), whose analytical solution is a Gaussian. Then, we have

 \footnotesizep(y∗|Φs∗,y,Φ)∼N(μy∗,σ2y∗)

with parameters , .

Logistical discrete system. The predictive distribution of discrete data is a Bernoulli distribution. By substituting the term of the posterior of in Eq. 13 for the variational approximation posterior given in Theorem 1, we have the following predictive distribution:

 \footnotesizep(y∗n=1|Φs∗,y,Φ)≈∫p(y∗n=1|Φs∗,s)q(s|y,Φ)ds,

where .

However, the integral in this approximation is a convolution of a logistical function with a Gaussian distribution, which cannot be analytically integrated. By introducing an error function, it can be approximated as a re-parameterized logistical function

[Maragakis et al.2008]:

 \footnotesize∫p(y∗=1|Φs∗,s)q(s|y,Φ)ds≈(1+e−τΦs∗μs)−1, (14)

where A very small approximation error is guaranteed in theory [Maragakis et al.2008].

### 3.4   Scaling up

In the SNMA, the most expensive operations are the matrix multiplication () and the matrix inverse () for calculating the posterior parameters in Theorem 1. Considering the large size of and , the time complexity of one iteration will be , making it infeasible to handle real-world problems. We now propose two fast matrix operations to solve the scalability problem. The time complexity after scaling up is reduced to , which is faster than the competitors in the experiments.

Definition 1. (Diagonal-block matrix). Matrix is called a diagonal-block matrix if it consists of by square sub-matrices, and each sub-matrix is a diagonal matrix with all diagonal entries having the same value , ==.

Definition 2. (Block projective matrix). Let be a diagonal-block matrix. Matrix is called the block projective matrix of if each entry , ==.

The structure of the two matrices and their relationship is shown in Fig. 2. The block projective matrix is much smaller than its original diagonal-block matrix . In the meantime, it contains the overall distinct elements in . Based on the definitions, we have the following theorems, which can significantly reduce the computational cost of the two kinds of operations.

Theorem 2. Let be two diagonal-block matrices with the same size of sub-matrix. The product is a diagonal-block matrix, and those block projective matrices satisfy .

###### Proof.

See the Supporting Information. ∎

Theorem 3. Let be a square diagonal-block matrix. Its inverse matrix is also a diagonal-block matrix and satisfies = .

###### Proof.

See the Supporting Information. ∎

In the SNMA, and are two diagonal-block matrices. When we alternatively calculate the multiplication () and the matrix inverse () on the block projective matrices and via the two theorems, the time complexity of one iteration can be reduced by 3 orders of magnitude (i.e., ). Note that this scaling-up technique not only solves the difficulty faced by our algorithm, but can also address other tasks involving diagonal-block matrices, such as the computational obstacle of a multiple measurement vector (MMV) model in compressed sensing [Zhang and Rao2011]

. To process further large-scale data in practice, the SNMA can be readily parallelized by adopting the group testing strategy, which has been used for parallel feature selection

[Zhou et al.2014].

### 3.5   Embedding non-linear dynamical models

The proposed framework is flexible and can be readily extended to various dynamical systems by mean of embedding basic functions, as long as the dynamical system functions can be represented as the combinations of basic functions. Each basic function, , can be an arbitrary function, e.g., polynomial, = ; trigonometric, = ; or others.

We illustrate the embedding technology based on the linear system Eq. 6 as an example, where denotes the state of component at time . Let vector = be the mapping of through basic functions. Let vector = denote the mapping of all components at time , and matrix = denote the mapping of all components during the time window. Then, the system function Eq.6 can be extended as , where the augmented sentinel network characterizes the interactions from the mapping to the components. This extended equation can also represent non-linear dynamical systems if non-linear basic functions are adopted, such as the SIR model, a well-studied and widely adopted disease spread model. Moreover, the basic functions of the SIR model can be constructed based on its reproduction matrix form [Wallinga, van Boven, and Lipsitch2010].

To integrate the above extended system function into the current SNMA, just let Kron(, ) and vec(. Meanwhile, the group size needs be changed from to . The rest of the steps are the same as the SNMA shows in Algorithm 1. For the logistical system, the extended process is analogous.

## 4   Validations

Comparative Study. We validate the framework on both synthetic and three real-world data. The two most related methods are selected as competitors: group Lasso and GPs-MI. Group Lasso [Meier, Van De Geer, and Bühlmann2008] is a typical method for group sparse learning, which is similar to that addressed by our proposed group sparse Bayesian learning algorithm. To achieve the aim of active surveillance, we use the proposed framework and replace the group sparse Bayesian learning with group lasso.

GPs-MI is a popular sensor placement method [Krause, Singh, and Guestrin2008, Hoang et al.2014], which is similar to the task of active surveillance addressed by our work. It outperforms the placement methods based on experiment design, such as A-, D-, and E-optimal designs. As Gaussian processes cannot directly work on discrete data, GPs-MI is only applied on experiments with continuous data.

Evaluation. Two criteria are adopted to evaluate the performance of the methods. Failure rate is used to measure whether the a method can discover the sentinels, i.e., the problem of sentinel identification. Failure rate is the percentage of wrong sentinels’ locations given by a method, , where is the set of the ground truth sentinel locations in, and is the set discovered by, a method.

We use root-mean-square error (RMSE) to quantify the prediction error, i.e., the problem of sentinel prediction. Specifically, RMSE=, where is the predicted epidemic dynamics based on the surveillance data.

### 4.1   Validations on synthetic data

Synthetic Data Generation. We generate synthetic data by imagining diffusion processes taking place in a linear continuous system or logistical discrete system. There are three steps in total: (1) Random generation of a ground truth -value vector with entries, where entries are sampled from and the other from , i.e., sentinels and trivial components; (2) Random sampling of a ground truth sentinel network via the prior Eq.3 based on the ground truth value; (3) Based on the , simulation of the epidemic dynamics via the linear system Eq. 6 or the logistical system Eq. 8 embedding a quadratic basic function .

Environment setting. The comparisons are conducted under various data volumes and noise levels. We use the value to denote the ratio of data volume to the number of parameters to estimate. Signal-to-noise ratio (SRN) and bit error rate (BER) are adopted to denote noise levels in the linear system and logistical system, respectively.

We adopt a -fold cross-validation strategy in experiments on synthetic data. We firstly identify sentinels from the training data via the three methods. For each method, the average failure rate of sentinel identification is shown in Figs. 3 (a-c). Then, we evaluate the performance of sentinel prediction by feeding the surveillance data (collected on the discovered sentinels) to the corresponding prediction model of each method, such as Eq. 13 for the proposed method. The average prediction error is shown in Figs.3(d-f). The results show that the proposed methods are superior to GPs-MI and group lasso on both failure rate and prediction error.

We evaluate the trade-off between the prediction accuracy and surveillance cost in the linear system by setting different number of sentinels as shown in Fig.3(g). In this experiment, there are only ground truth sentinels in . We give the number of sentinels (-axis), and then evaluate the prediction error of each method ( axis). Intuitively, the more sentinels that are selected, the better the accuracy that can be obtained. However, as indicated, the improvement in prediction becomes negligible when is over . Note that GPs-MI shows a better performance only when is very small.

Fig. 7 presents the average running time of one iteration of the three methods on a PC with a 3.4GHz CPU and 8GB memory. Since GPs-MI employ forward greedy strategy (fast when only a few sentinels) but SNMA use backward selection method (fast when many sentinels), it’s fair to evaluate them by comparing the running time of one iteration of each algorithm. The results show group lasso is the slowest, and GPs-MI and SNMA are almost at the same level.

### 4.2   Validations on real diffusion data

In real cases, the failure rate cannot be evaluated because the ground truth sentinel network is unknown.

#### 4.2.1   2009 Hong Kong H1N1 flu pandemic

The cases report data of 2009 Hong Kong H1N1 influenza epidemic, was provided by Centre for Health Protection (CHP), Department of Health, Government of the Hong Kong Special Administrative Region. During this pandemic, the first imported case of human swine influenza (HSI) was confirmed on May 1, 2009. As of Sep. 2010, there were over 36,000 confirmed cases of HSI, among which about 290 were severe cases and over 80 of them died [CHP2010]. We consider the epidemic dynamics for 105 days since the disease onset in Jun. 1st 2009 based on the confirmed cases of H1N1 infection reported by CHP (www.chp.gov.hk), which gives the spatial position and infection times of each case.

Hong Kong consists of 18 administrative districts, i.e., 18 components. By merging the confirmed cases during 105 days, 3 days as a basic infectious period, we obtain the dynamics of = and =. We set the dynamics from Jun.1 to Aug.15 as training data and the one from Aug. 15 to Sep. 15 as test data. According to a pre-given , sentinel districts of Hong Kong can be identified via the methods (SNMA use the linear system setting). Then, based on the sentinels, we predict the newly cases on test data as shown in Fig 5 (a). Obviously, SNMA outperforms the competitors. Fig. 6 (a) shows a case of fittings and predictions for the dynamics ( districts are selected as sentinels). The sentinel network of Hong Kong and the sentinels’ spatial distribution can be found in Supporting Information.

#### 4.2.2   2005-2009 Tengchong malaria outbreak

Tengchong City, Yunnan Province, China, has 18 towns, and 658,207 residents that are distributed in a wide area of 5,845 km in 2011. Because of the suitable climate for mosquito habitats, Tengchong has a quite serious malaria outbreak. Five years’ (2005-2009) monthly malaria cases data at the town level, were collected by Tengchong CDC and can be obtained from the annual reports of National Institute of Parasitic Disease, China CDC. By eliminating the missing data from 2005 Jun. to Dec., we get malaria dynamics, which contains components and months.

We set the malaria dynamics from 2005 to 2008 as training data and the one during 2009 as test data. Similar to the Hong Kong case, we identify the sentinel towns according to a pre-given and predict the dynamics on the test data, as shown in Fig 5 (b). Once again, SNMA achieves the best sentinel prediction in most cases. Fig. 6 (b) shows a case of fittings and predictions for the malaria dynamics with sentinels. The sentinel network of Thengchong and the sentinels’ distribution is in Supporting Information.

#### 4.2.3   Hot words diffusion in Baidu Tieba

Baidu Tieba (tieba.baidu.com), one of the largest online community platforms in China, is a collection of thousands of active topic-specific communities. Tieba users can post any hot words (vocabularies that are widely used in Tieba during a short period) in any communities. Hot words often present a diffusion phenomena in Baidu Tieba: a hot word first appears in only a few forums, and then is posted gradually in many other forums by the users who are active in multiple forums. Thus, Tieba can be regarded as a logistical discrete dynamical system, where communities are components, hot words are contagions, and the infected state of a component is or . GPs-MI cannot be applied to this case because it’s base on Gaussian model and cannot directly work on discrete data.

We tracked the dynamics of 11 independent hot words cascading among the top-100 active communities in Baidu Tieba from Apr. 2014 to Oct. 2015 (18 months). Only the dynamics during the words’ bursting period is preserved, and the total bursting period of the words is 738 d, i.e., = , = by using 1 d as a time-unit. Here, we split the training and test data in term of the hot words, i.e., we alternatively set the dynamics of one hot word as test data and the rest be the training data. Then, we identify the sentinel communities in Baidu Tieba by the methods (GPs-MI cannot be applied in discrete data; SNMA use the logistic system setting). Fig 5 (c) shows the results of sentinel prediction. Fig. 6 (c) shows that the dynamics of the hot word “cutie” is successfully predicted based on the data from sentinel communities.

Summarily, SNMA outperform GPs-MI and group lasso in the most experiments. Our method has two obvious advantages. (1) Our method is model-based and can readily integrate prior knowledge, which makes it more effective and easier to train. (2) Our method is more robust against noise and insufficient data owing to the Bayesian framework that can effectively handle the uncertainty from both data and model.

## 5   Conclusions

In this work, we addressed the challenge of epidemic dynamics prediction in cases in which surveillance resources are very limited. We proposed a novel importance measure, the value, by modeling a sentinel network with row sparse structure and presented an effective and flexible group sparse Bayesian learning algorithm for mining the sentinel network in two kinds of widely used dynamical systems. With the discovered sentinel network, the overall epidemic dynamics can be predicted based on partial data only collected by the few sentinels. Moreover, we significantly reduced the computational complexity of the algorithm and extended it to various nonlinear systems using basic function embedding technology. We validated the proposed framework by a set of experiments on both synthetic and real-world datasets.

## Supporting Information

### Theorem Proofs

#### Theorem 1

###### Proof.

Linear continuous system.

The posterior distribution over of the linear system is given by:

 p(s|y,Φ,γ,λ)=p(y|Φ,s,λ)p(s|γ)p(y|Φ,γ,λ), (15)

where and (refer to Eq. 7 and 4 in the main paper). Thus, the numerator of Eq. 15 is a product of two Gaussians, which is still a Gaussian.

The denominator is the marginal likelihood, , which is a Gaussian convolution. The convolution can be analytically calculated and its result is also a Gaussian. Then the posterior of can be obtained, which is still a Gaussian, with parameters

 {μs=λ−1ΣsΦTyΣ−1s=Σ−10+λ−1ΦTΦ.

Logistic discrete system.

The posterior over in the logistic system is given by:

 p(s|y,Φ,γ)=p(y|Φ,s)p(s|γ)p(y|Φ,γ), (16)

where is a Bernoulli distribution and a Gaussian distribution (refer to Eq. 9 and Eq. 4 in the main paper). We cannot directly calculate the Eq. 16 because its denominator, the marginal likelihood = , cannot be analytically integrated.

We deduce the posterior in an alternative way. Firstly, by using the local variational method [Bishop2006], a lower bound function on the sigmoid having a Gaussian form can be introduced,

 σ[Φns]≥σ[ξn]e12(Φns−ξn)−π(ξn)((Φns)2−ξ2n),

where = and is an introduced variational parameter.

By substituting the lower bound into likelihood function (refer to Eq. 9 in the main paper), the likelihood of the logistic system , we get a lower bound on the likelihood,

 p(y|Φ,s)≥h(s,ξ)=∏TNn=1σ[ξn]eΦnsyn−12(Φns+ξn)−π(ξn)((Φns)2−ξ2n).

Then we have a lower bound on the joint distribution of

and :

 p(y,s|Φ,γ)=p(y|Φ,s)p(s|γ)≥h(s,ξ)p(s|γ) (17) =e12sTΣ−10s+∑TNn=1(Φns(yn−12)−π(ξn)(Φns)2)+const.

Note that, the exponent term in Eq. 17 is a quadratic function of . By identifying the linear and quadratic terms of and then normalizing the function over , we obtain a Gaussian approximation to the original posterior over , with parameters

 {μs=2−1ΣsΦT(2y−1)Σ−1s=Σ−10+ΦTπ(ξ)Φ,

where vector = , is a diagonal matrix. Specially, = = .

#### Theorem 2

###### Proof.

Without loss of generality, let and denote two diagonal-block matrix. The sub-matrix of is , who is a diagonal matrix with all diagonal entries having the same value entry . Analogously, matrix , sub-matrix , and entry have the same relationship.

Based on the property of block matrix, the product is a block matrix and its sub-matrix =, for ==. And is a diagonal matrix with all diagonal entries having the same value =. Thus, is a diagonal-block matrix according to the Definition 1 (in the main paper), and is the block projective matrix of according to the Definition 2 (in the main paper).

For each entry of , = == . Therefore, = . ∎

#### Theorem 3

###### Proof.

Without loss of generality, let square matrix denote a diagonal-block matrix. The adjugate matrix of , , is also a diagonal-block matrix because that if an entry is zero in the matrix the corresponding entry who has the same location in adjugate matrix must be also zero.

The inverse matrix of can be calculated by

where the determinant is a scalar. Thus the inverse matrix has the same structure with , i.e., the inverse matrix is also a diagonal-block matrix.

Let be the block projective matrix of square matrix . Based on eigendecomposition rules, we have = and = , where and

are two diagonal matrices whose diagonal elements are the corresponding eigenvalues, i.e.,

=diag and =diag, where vector and are the eigenvalues of and , respectively.

Let denote the -th eigenvalue in . We have = , where

is the corresponding eigenvector. According to the Theorem 2, we have

= , where is a diagonal-block matrix and its block projective matrix is . That is to say, is a -tuply repeated eigenvalue to , i.e., the entries of repeat times in . Therefore, is the block projective matrix of , i.e., = . As and are both diagonal matrices, their inverse matrices satisfy = . Analogously, we can also have = .

The inverse matrix of and are = and = , respectively. Based on the conclusions of = and = , we have = according to the Theorem 2. ∎

### Derivation of Hyper-parameters Learning Rules

We propose two group sparse Bayesian learning methods to mine sentinel network for linear continuous systems and logistic discrete systems, respectively. We estimate the hyper-parameters set by employing the EM (Expectation-Maximization) method to maximize the marginal likelihood , by treating as hidden variables. Specifically, for the linear system, and for the logistic system.

### Linear continuous system

Based on the posterior (refer to the Theorem 1 in the main paper), the the conditional expectation of complete log-likelihood, i.e., the Q-function of the linear system, is given by,

where denotes the conditional expectation with respect to the posterior, and denotes the parameter estimated in the previous iteration. Notice that the log-terms of and are separated in . Thus we can divide the Q-function into two simplifying sub-Q-functions.

The sub-Q-funciton for is:

 Q(γ)=Es|y,Φ,γ(old),λ(old)[log(p(s|γ))]∝−log|Σ0|−Tr[Σ−10(Σs+μsμTs)].

The derivative of with respect to is given by,

 ∂Q(γ)∂γi=−N2γi+12γ2iTr[Σis+μis(μis)T],

where and are the -th block in and respectively.

By setting the derivative be zero, the update rule for , i.e., the -value of component , can be obtained (the Eq. 12 in the main paper),

 γi←((μis)Tμis+Tr[Σis])N−1, i=1,…,N.

To estimate , the sub-Q-funciton is given by

 Q(λ)∝−TNlogλ−1λEs|y,Φ,γ(old),λ(old)[∥y−Φs∥22]=−TNlogλ−1λ(∥y−Φμs∥22+Es|y,Φ,γ(old),λ(old)[∥Φ(s−μs)∥22])=−TNlogλ−1λ(∥y−Φμs∥22+Tr[ΣsΦTΦ])

The derivative of the with respect to is given by

 ∂Q(λ)∂λ=−TNλ+1λ2(∥y−Φμs∥22+Tr[ΣsΦTΦ])

By setting the derivative to zero, the update rule for can be obtain (the Eq. 10 in the main paper),

 λ←(TN)−1(∥y−Φμs∥22+Tr[ΣsΦTΦ])

### Logistic discrete system

As aforementioned the marginal likelihood in the posterior of cannot be analytically integrated in the logistic system, we utilize local variational method to estimate the hyper-parameters in the EM framework. Firstly, we introduce a lower bound function of sigmoid function,

 σ[Φns]≥σ[ξn]e12(Φns−ξn)−π(ξn)((Φns)2−ξ2n),

where = and is an introduced variational parameter.

By substituting the lower bound function into the likelihood of the logistic system (refer to Eq. 9 in the main paper), a lower bound on the likelihood of the logistic system is given by:

 p(y|Φ,s)≥h(s,ξ)=∏TNn=1σ[ξn]eΦnsyn−12(Φns+ξn)−π(ξn)((Φns)2−ξ2n),

where, vector is the introduced variational parameters and its entry is . Then the Q-function can be written as:

 Q(ξ,γ)=Es|y,Φ,γ(old),ξ(old)[log(h(s,ξ)p(s|γ))].

This Q-function also can be separated w.r.t and , and the sub-Q-funciton for is:

 Q(ξ)∝TN∑n=1logσ[ξn]−ξn2−π(ξn)(Φn(Σs+μsμTs)ΦTn−ξ2n).

By letting the derivative of over to be zero, we can get the learning rule for (the Eq. 11 in the main paper):

 ξn←√Φn(Σs+μsμTs)ΦTn, n=1,…,TN.

In the logistic system, the update rule for is the same as the linear system (the Eq. 12 in the main paper), as they have the same sub-Q-funciton of .

## References

• [Bishop2006] Bishop, C. M. 2006. Berlin: Springer.
• [Brunton, Proctor, and Kutz2016] Brunton, S. L.; Proctor, J. L.; and Kutz, J. N. 2016. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113(15):3932–3937.
• [Chen et al.2013] Chen, Y.; Amiri, H.; Li, Z.; and Chua, T.-S. 2013. Emerging topic detection for organizations from microblogs. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, 43–52. Dublin, Ireland: ACM.
• [CHP2010] 2010. Summary report on the surveillance of adverse events following HSI immunisation and expert group’s comment on the safety of hsi vaccine in Hong Kong. Technical report, Centre for Health Protection, Hong Kong.
• [Dimitrov and Meyers2010] Dimitrov, N. B., and Meyers, L. A. 2010. Mathematical approaches to infectious disease prediction and control. Technical report.
• [Gerardo-Giorda et al.2013] Gerardo-Giorda, L.; Puggioni, G.; Rudd, R. J.; Waller, L. A.; and Real, L. A. 2013. Structuring targeted surveillance for monitoring disease emergence by mapping observational data onto ecological process. Journal of The Royal Society Interface 10(86):20130418.
• [Gomez Rodriguez, Leskovec, and Krause2010] Gomez Rodriguez, M.; Leskovec, J.; and Krause, A. 2010. Inferring networks of diffusion and influence. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1019–1028. Washington, USA: ACM.
• [Hoang et al.2014] Hoang, T. N.; Low, K. H.; Jaillet, P.; and Kankanhalli, M. 2014. Nonmyopic

-bayes-optimal active learning of gaussian processes.

In Proceedings of the 24th International Conference on Machine Learning, 739–747. Beijing, China: IMLS.
• [Hsieh, Lin, and Zheng2015] Hsieh, H.-P.; Lin, S.-D.; and Zheng, Y. 2015. Inferring air quality for station location recommendation based on urban big data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 437–446. Sydney, Australia: ACM.
• [Khuller, Moss, and Naor1999] Khuller, S.; Moss, A.; and Naor, J. S. 1999. The budgeted maximum coverage problem. Information Processing Letters 70(1):39–45.
• [Krause et al.2008] Krause, A.; Leskovec, J.; Guestrin, C.; VanBriesen, J.; and Faloutsos, C. 2008. Efficient sensor placement optimization for securing large water distribution networks. Journal of Water Resources Planning and Management 134(6):516–526.
• [Krause, Singh, and Guestrin2008] Krause, A.; Singh, A.; and Guestrin, C. 2008. Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research 9(Feb):235–284.
• [Leskovec et al.2007] Leskovec, J.; Krause, A.; Guestrin, C.; Faloutsos, C.; VanBriesen, J.; and Glance, N. 2007. Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 420–429. California, USA: ACM.
• [MacKay1995] MacKay, D. J. 1995. Probable networks and plausible predictions - a review of practical bayesian methods for supervised neural networks. Network: Computation in Neural Systems 6(3):469–505.
• [Maragakis et al.2008] Maragakis, P.; Ritort, F.; Bustamante, C.; Karplus, M.; and Crooks, G. E. 2008. Bayesian estimates of free energies from nonequilibrium work data in the presence of instrument noise. The Journal of Chemical Physics 129(2):024102.
• [Meier, Van De Geer, and Bühlmann2008] Meier, L.; Van De Geer, S.; and Bühlmann, P. 2008.

The group lasso for logistic regression.

Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(1):53–71.
• [Polgreen et al.2009] Polgreen, P. M.; Chen, Z.; Segre, A. M.; Harris, M. L.; Pentella, M. A.; and Rushton, G. 2009. Optimizing influenza sentinel surveillance at the state level. American journal of epidemiology kwp270.
• [Wallinga, van Boven, and Lipsitch2010] Wallinga, J.; van Boven, M.; and Lipsitch, M. 2010. Optimizing infectious disease interventions during an emerging epidemic. Proceedings of the National Academy of Sciences 107(2):923–928.
• [Yang et al.2014] Yang, B.; Guo, H.; Yang, Y.; Shi, B.; Zhou, X.; and Liu, J. 2014. Modeling and mining spatiotemporal patterns of infection risk from heterogeneous data for active surveillance planning. In

Proceedings of the the 28th AAAI Conference on Artificial Intelligence

, 493–499.
<