## 1 Introduction

The network data is becoming increasingly popular, which has many important applications in various disciplines, including sociology (Simmel, 1950; Wasserman et al., 1994; Hanneman and Riddle, 2005), genomics (Wu et al., 2010; Horvath, 2011), psychology (Borgatti et al., 2009; Cramer et al., 2010; Borsboom and Cramer, 2013), finance and economics (Zou et al., 2017; Zhu et al., 2019; Leung et al., 2017). In the field of sociology, the network data is used to study the structure of relationships by linking individuals or other social units and by modeling interdependencies in behaviors or attitudes related to configurations of social relations (O’malley and Marsden, 2008). In biological and genetic areas, the presence of an interaction between two genes or proteins indicates a biologically functional relationship (Von Mering et al., 2002), which makes the network data widely used in cancer and other disease data analysis with promising results (Guda et al., 2009; Chuang et al., 2007). A continuous response observed on each node at equally spaced time points is very common in various applications such as biological studies, economics, and finance. Building a network based model to recover the dynamics of the responses is necessary.

We consider a network with nodes, which are indexed as . To represent the network relationship among the network nodes, we employ an adjacency matrix , where indicates the th node follows the th node, otherwise . Following the convention we let for . Denote as the continuous response collected from the node during . Correspondingly, we collect a number of nodal covariates (e.g., user’s age, gender), which is denoted by . To model the dynamics of the response , Zhu et al. (2017) proposed a network vector autoregression (NAR) model, which can be expressed as follows,

(1.1) |

where is the out-degree of node and is the random noise term. The NAR model as well as its extensions have been implemented to a broad range of fields, including spatial data modeling (Lee and Yu, 2009; Shi and Lee, 2017), social studies (Sojourner, 2013; Liu et al., 2017; Zhu and Pan, 2018), financial risk management (Härdle et al., 2016; Zou et al., 2017), and many others.

Despite the usefulness of the NAR model, it lacks flexibility in its model formulation and suffers from model misspecification. Specifically, it specifies a homogeneous network autoregression coefficient (i.e., ) for all network nodes, which is highly restrictive in practice. To enhance the model’s flexibility, a popular way is to consider a group structure of model coefficients (Ke et al., 2015; Su et al., 2016; Ando and Bai, 2016; Guðmundsson and Brownlees, 2021). With respect to the network data, Zhu and Pan (2018) considered a grouped network autoregression (GNAR) model. To be more specific, the nodes are clustered into several groups, and the nodes within the same group share a set of common parameters. Both the group memberships and group-specific regression coefficients are to be estimated.

Although the model’s flexibility is enhanced and the misspecification risk is reduced, there are several unsolved challenges for the GNAR model. First, it is necessary to incorporate the graphical information (i.e., ) into grouping process. Nevertheless, most existing methods ignore graphical information in their grouping process. On the other hand, another line of research takes advantages of the graphical information and uses nonparametric Bayesian methods for clustering nodes (Sewell and Chen, 2017; Geng et al., 2019; van der Pas and van der Vaart, 2018), but they totally ignore the dynamics of

. The graphical information, a connectivity pattern among different nodes, will partially determines the group configurations. In other words, the closely connected nodes should have higher probability belonging to the same group. How to incorporate the graphical information into the modelling procedure of the GNAR model is a challenging task. Second, proposing a simultaneously inference procedure on both group configurations and group-wised parameters is another important consideration for GNAR model. The uncertainty of parameter estimation is often neglected in existing literature, and it is difficult to get the probabilistic interpretations from existing frequentist approaches. Lastly, another important problem in grouping methods is how to determine the number of groups. Most existing methods require specification of the number of groups first and then estimate group configurations. The number of groups is determined by information criteria or some

ad hoc procedures. These procedures may ignore uncertainty of the estimation of the number of groups in the first stage and lack uncertainty quantification of the number of groups, which is also an important issue for GNAR model.Our methodology development is directly motivated by solving the aforementioned challenges to fill the gap between nonparametric Bayesian methods and network autoregression model. In this article, we propose a novel graphical assistant grouped network autoregression (GAGNAR) model, considering both the heterogeneous network effects across nodes and the graph information in the grouping procedure under a novel nonparametric Bayesian mixture model framework. Specifically, a graphically assistant Chinese Restaurant Process (gaCRP) borrowed from Geng and Hu (2021) is introduced to the GNAR model framework, which incorporates the graphical information and utilizes the sampling procedure from the Chinese Restaurant Process (CRP) (Pitman, 1995)

. The proposed prior will provide simultaneous inference on the number of groups and group configurations. Moreover, the gaCRP has a Pólya urn scheme similar to the traditional CRP. Due to this merit, we can develop a Gibbs sampler that enables efficient full Bayesian inference on the number of groups, mixture probabilities as well as group-specific parameters. We demonstrate its excellent numerical performance through simulations and analysis of two real datasets, compared with several benchmark methods.

Our proposed method is unique in the following aspects. First, the proposed method provides a useful model-based solution for GNAR model that is able to leverage graphical information without pre-specifying the number of groups. Secondly, by adopting a full Bayesian framework, the grouping results yield useful probabilistic interpretation. Thirdly, the developed posterior sampling scheme also renders efficient computation and convenient inference since our proposed collapsed Gibbs sampling algorithm by marginalizing over the number of groups avoid complicated reversible jump Markov Chain Monte Carlo (MCMC) algorithm (Green, 1995) or allocation samplers.

The rest of this paper is organized as follows. In Section 2, we briefly review the NAR and GNAR models, followed by a review of Bayesian mixture model and Dirichlet process. In addition, we describe the graphical assistant prior model and introduce our proposed GAGNAR model and its theoretical properties. The Bayesian inferences including the MCMC sampling algorithm, the post-MCMC estimation and the model selection criterion for tuning parameter are presented in Section 3. In Section 4, we study the finite sample performance of our proposed method via extensive simulation studies. Two real datasets are illustrated by our proposed method in Section 5. Section 6 concludes and raises some in-depth discussions. All technique proofs and additional numerical results are given in supplementary materials.

## 2 Methodology

### 2.1 Grouped Network Autoregression Model

Consider a network with nodes with the corresponding adjacency matrix denoted by . Assume the nodes are clustered into latent groups. Denote as the -field generated by . Given , the observations at time point are assumed to be independent and follow a Gaussian mixture distribution with components. Given the number of groups and the adjacency matrix , the GNAR model can be expressed as a mixture model, i.e.,

(2.1) |

where

is the density function of normal distribution with mean

and variance

in the th group, and is the mixing weight satisfying . Here collects the unknown parameters.The GNAR model (2.1) characterizes the conditional mean of the response by a combination of four components. The first one is the baseline effect , which is a constant but varying among different groups. The second one is the network component (), which reflects the average impact of following nodes at the previous time point. Therefore, we refer to as the group network effect. The third one is the momentum component (). It characterizes how the focal node is influenced by its historical behaviors. The corresponding parameter is then referred to as group momentum effect, quantifying the node’s self-dependence. The last one is the nodal covariate component (), which includes the node specific characteristics and it is invariant over time. As implied by the GNAR model (2.1), the nodes within the same group share the same set of coefficients, therefore they exhibit similar dynamic patterns. Compared to the NAR model (Zhu et al., 2017), the GNAR model is able to capture the nodes’ heterogeneous pattern in the group level. To estimate the model parameters simultaneously with the group memberships, an EM algorithm could be designed (Zhu and Pan, 2018).

Despite the usefulness, the model has three main limitations. First, the estimation can be unreliable when the observed time length is short. That decreases the model’s stability and robustness in practice. Second, the network structure information is not fully exploited for estimating the group memberships. Third, the number of groups cannot be automatically estimated but needs to be specified in advance. That increases the possibility that the model is misspecified. As a consequence, we are motivated to consider the problem in a Bayesian framework and introduce a graph assisted prior, where the network structure information is employed when forming the groups. The details are presented in the following section.

### 2.2 Bayesian Graph Assisted Prior

For each node , suppose it carries a latent group membership , where implies that the node is from the th group. Given and

, we have a Gaussian mixture model

(Marin et al., 2005) as follows,where . Under the Bayesian hierarchical model, for the unknown group memberships, we specify the model priors as

(2.2) |

where follows a categorical distribution with parameters , follows a Dirichlet distribution with parameter , and is joint prior for all group-wise parameters. When is unknown, we can let go to infinity, and then the model in (2.2) becomes the Dirichlet process mixture model (DPMM; Ishwaran and Zarepour, 2002). The DPMM is very flexible and efficient for density estimation. However, it suffers from several problems in estimating the group structure. First, the partitions sampled from the DPMM posterior tend to have multiple small transient groups, which decreases the interpretability of the model. Second, the DPMM is inconsistent on the estimation of the number of groups (Miller and Harrison, 2018). Third, the DPMM does not take advantage of the network structure information: the connected nodes do not have higher probability being in the same group.

In order to address the above problems, we employ a Bayesian graph assisted prior. Specifically we assume that the group membership is generated from the weighted Dirichlet process. The process assumes that the nodes’ memberships are sequentially generated. Suppose the nodes form groups, then the conditional distribution for is expressed as

(2.3) |

where and is a weighting matrix calculated based on . The in the weight matrix denotes the connection strength between node and . The parameter controls the probability for introducing a new group. As a result, (2.3) implies that a node is more likely to fit in the same group with its connected friends.

###### Remark 1

With the probabilistic Bayesian framework, simultaneous estimation of the number of groups and the grouping configuration is typically achieved by complicated searching algorithms (e.g., reversible jump MCMC (Green, 1995)). However, it suffers from high computational complexity. In our framework, the specification of (2.3) allows the nodes’ memberships to be generated sequentially and therefore avoids setting the number of groups in advance. According to (2.3), a new group will formulate once is assigned with a new group label . This enables a sequential generation of groups. Once we generate all the group memberships , we know immediately the number of groups .

Let be a continuous probability measure on . We define the full conditional distribution of given as

(2.4) |

where is the density function, denotes the number of groups excluding the -th observation, are distinguished values of and is the identity function. In practice we set the base distribution

to be the normal inverse gamma distribution.

Inspired by the graph assisted Chinese restaurant process (gaCRP) (Geng and Hu, 2021) for survival model, we define the graph distance between node and as the shortest path length between node and . If node and cannot be connected with a finite length of path, then . The weight is then defined as

(2.5) |

Therefore the weight characterizes the closeness of the node and in the network. The parameter denotes the smoothing scale of the weighting matrix. A larger value of indicates the weight is more concentrating on the local scale. For simplicity, we refer to gaCRP introduced above as gaCRP(). Although the idea of using a weighted version of CRP has been explored in Geng and Hu (2021), their model is based on Cox regression for survival data analysis, which significantly differs from our model, since it lacks discussion on dynamic process of .

###### Remark 2

Implied by (2.5), if , then the gaCRP is the same as the traditional CRP and may tend to over cluster the nodes. If , the probability for a new node to choose the existing groups tends to be small, which may also lead to the over-clustering problem. Hence, appropriate selection of is critical for the model performance. We discuss in details about the selection of in Section 3.3.

In summary, we express the proposed model hierarchically as

(2.6) |

where , , and are hyper-parameters for base distribution of . Here the prior distribution of is set to be normal inverse gamma (NIG) distribution with parameters . Without loss of generality, we choose , , , in our simulation study and real data applications. We refer to the model (2.6) as graph assisted group network autoregression (GAGNAR) model.

## 3 Bayesian Estimation

In this section, we discuss the model estimation for the GAGNAR model. Define group membership vector . If the group memberships are known, then the model parameters in the GNAR model (2.1) can be easily estimated with least square estimation method. However, in practice is unobserved thus the least square estimation is infeasible. To estimate the model parameters simultaneously with the latent group memberships, we employ the Markov chain Monte Carlo (MCMC) algorithm with details given as follows.

### 3.1 MCMC Algorithm

To conduct Bayesian estimation, we first derive the posterior distribution of the unknown parameters and group memberships. Define and . In addition, denote , , and . Given the data , the joint posterior distribution of latent membership and unknown parameters could be written as

(3.1) |

where denotes “proportional to", and , , stand for the prior distributions for , , and respectively. For convenience, let , then given we have

The prior distribution for is specified as gaCRP as stated in Section 2.2. In addition, we specify the prior distribution for as multivariate normal distribution and we set the prior for as the commonly used scaled inverse gamma distribution , where , , and are the hyper parameters.

Given the joint posterior distribution of in (3.1), we can obtain the conditional posterior distribution for each latent group membership and model parameters respectively. This enables us to conduct the MCMC algorithm for model estimation. We first give the conditional posterior distribution for as follows. For convenience we let . Denote full conditional of as . Further define and . Then we have

The analytical form of is given in Proposition 1.

###### Proposition 1

By the formulation of , we can observe that the probability belonging to an existing group () is closely related to the network weighting matrix . Particularly denotes a weighted average ratio of its connected friends belonging to the group , which characterizes a stickiness to the group of th neighbourhood. If the stickiness level is higher, then the conditional probability of will be higher. That is how the network topology information is involved in the sampling procedure. Next, the node is allowed to fit into a new group with probability proportional to , which relates to both prior information and data information.

Next, we investigate the full conditional distribution for model parameters . Denote the posterior distribution of as , then we derive in the following Proposition 2.

###### Proposition 2

The full conditional distribution is given as

where

The proofs of Propositions are given in supplementary materials. As shown in Proposition 2, the conditional distribution of follows normal inverse gamma distribution (NIG) with density function . With the expressions of , we see that it unifies the prior information and the data information from the th group, i.e., . Based on Proposition 1 and Proposition 2, we could efficiently cycle through the full conditional distribution for and for to conduct Gibbs sampling procedure. One should note that in the process of node memberships generation, is marginalized over, which could avoid complicated reversible jump MCMC algorithms or allocation samplers. We summarize the collapsed Gibbs sampling procedure for the GAGNAR model in Algorithm 1.

### 3.2 Post MCMC Estimation

Let and with be the th estimate after the burn-in iterations of the MCMC algorithm. Correspondingly denote as the estimated memberships of the network nodes. Particularly we note that the estimated number of groups can be different for . As a result, direct posterior estimation of the parameters would be difficult since the number of estimated parameters are not the same across different iterations. To address this issue, we adopt Dahl’s method (Dahl, 2006) to select the best post burn-in iteration with the least squares criterion. The estimate output by the best post burn-in iteration is then taken as the final post MCMC estimator.

Specifically, we take advantage of the co-membership matrix to help us select the best post burn-in iteration. Define the co-membership matrix as , where . Therefore the th element of denotes whether the th node is in the same group with the th node. We can see that is well defined for different s and it does not suffer from the label switching issue. We then choose the best post burn-in iteration as

where is the estimated co-membership matrix in the th post burn-in iteration and . As a consequence, the best post burn-in iteration is selected by the closest to the mean grouping co-membership . The number of groups is then determined as . Accordingly, the post MCMC estimations of the parameters and memberships are given by with and respectively.

### 3.3 Selection of Smoothing Parameter

The selection of smoothing parameter is redesigned as a model selection problem. Specifically, we use the logarithm of the pseudo marginal likelihood (LPML) (Ibrahim et al., 2014) based on conditional predictive ordinate (CPO) (Gelfand et al., 1992; Geisser, 1993; Gelfand and Dey, 1994) to select .

The LPML given a specified is defined as

(3.2) |

where CPO is the CPO for the node defined as CPO (Pettit, 1990), and . As one can see, LPML is a pseudo log-likelihood function and we prefer a smoothing parameter to maximize LPML. Borrowing the idea of Chen et al. (2012), we obtain the Monte Carlo estimate of CPO within the Bayesian framework as

where is the total number of Monte Carlo iterations, and

is the likelihood function for the node with the specified smoothing parameter . Correspondingly we have and we choose the optimal by .

## 4 Simulation

### 4.1 Simulation Models and Data

To demonstrate the finite sample performance of our proposed method, we conduct a number of numerical studies in this section. Specifically, we utilize three types of graph and assign two parameter settings to each of them.

For each simulation setting, we generate the random noise from a standard normal distribution. In addition, node covariates are independently sampled from a multivariate normal distribution . For each graph, two different scenarios of parameters , which are listed in Table 1. Given the initial value , the time series is generated according to the grouped network vector auto-regression model in (2.1). In each scenario, 100 replicated datasets are generated, and in each replicate we set the time length . In addition, for the prior distribution gaCRP, we set and select the smoothing parameters by LPML (3.2). A total of 1500 MCMC iterations are run for each replicate, with the first 500 iterations treated as burn-in. Besides, the hyper-parameters set for the base distribution in (2.6) are , which are noninformative priors.

Example 1 | ||||||||||

Scenario 1 | Scenario 2 | |||||||||

Group 1 | 2.0 | 2.0 | ||||||||

Group 2 | 1.0 | 4.0 | ||||||||

Group 3 | 3.0 | 3.0 | ||||||||

Example 2 | ||||||||||

Scenario 1 | Scenario 2 | |||||||||

Group 1 | 2.0 | 2.0 | ||||||||

Group 2 | 1.0 | 1.0 | ||||||||

Group 3 | 3.0 | 3.0 | ||||||||

Group 4 | 4.0 | 4.0 | ||||||||

Group 5 | 2.0 | 2.0 | ||||||||

Example 3 | ||||||||||

Scenario 1 | Scenario 2 | |||||||||

Group 1 | 2.0 | 2.0 | ||||||||

Group 2 | 1.0 | 1.0 | ||||||||

Group 3 | 3.0 | 3.0 | ||||||||

Group 4 | 4.0 | 1.5 | ||||||||

Group 5 | 2.0 | 2.5 | ||||||||

Group 6 | 3.0 | 1.0 |

Example 1. (Stochastic Block Model) We first consider the stochastic block model (SBM) (Wang and Wong, 1987; Nowicki and Snijders, 2001; Zhao et al., 2012). The SBM assumes that nodes in the same group (block) are more likely to be connected, when compared with nodes from different groups. The model is widely used to discover community structures for network data. We follow Nowicki and Snijders (2001) to randomly assign each node a group label with equal probability and we set . Next, let if node and node are in the same group, and otherwise. For this example we set the network size .

Example 2. (Chinese Cities Graph) In this example we construct the graph of Chinese cities by using the geographical information. We collect 151 cities of mainland China and treat each city as a network node in the graph. The edge between two cities is defined as whether they share the common boarder (Cao et al., 2017; Zhang et al., 2018). Specifically, illustrates that two cities are connected, otherwise . Lastly, we let be the total number of groups and assign the group labels using -means clustering on the adjacency matrix.

Example 3. (Common Shareholder Network) We lastly construct the stock graph from Chinese A stock market, which contains actively traded stocks in the Shanghai and the Shenzhen Stock Exchange. The stocks used in this simulation example is a subset of our empirical study for computational convenience. Specifically, indicates that two stocks share at least one of the top two shareholders, otherwise . The graph structure is shown in the bottom right panel of Figure 1. We set the number of groups and assign the group memberships by implementing the SBM to fit the adjacency matrix.

The three network structures with the true group labels are visualized in Figure 1.

### 4.2 Performance Measurements and Simulation Results

Let be the estimated parameters of the th group by the th replicate (). For each node , we obtain its group label as by Dalh’s method. Subsequently, the estimated parameters for each node is given as . We consider the following measurements to evaluate the finite sample performance. First, we employ the root mean square error (RMSE) to evaluate the estimation accuracy. For (), the RMSE is defined as

and the RMSE for is calculated as

To measure the similarity between the estimated group memberships and the true group memberships, we use Adjusted Rand Index (ARI) (Rand, 1971; Hubert and Arabie, 1985). Define as the set of the true group memberships. Then ARI is defined as

where , , and is the total number of possible node pairs formed by the nodes. Hence ARI measures the alignment level of two grouping results and a higher ARI value implies the estimated group memberships are more consistent with the true memberships. The ARI can be calculated using the R-package mclust. We compare the parameter estimation of proposed model with that of EM algorithm and two-step estimation for the GNAR model (Zhu and Pan, 2018). One should note that, since the number of groups could not be inferred in the two baseline methods, we apply the true value of as the input of EM and two-step method.

Table 2 summarises the average RMSE of estimated parameters under the optimal selected by LPML. For all three simulation examples, we see that the overall RMSEs of estimated parameters by the GAGNAR model are much lower than those of the EM and two-step methods. This is consistent with the weakness we mentioned that the GNAR model does not utilize the network structure information. In addition, we see that graphs in example 2 and 3 are more complicated than that generated from SBM, due to the larger number of groups and the group patterns which are not easily captured (see Figure 1). In both examples, the estimation accuracy of our model is shown to be much higher than that EM and two-step estimators. This further confirms the usefulness of the proposed GAGNAR model in the intricate reality.

Figure 2 shows the estimated number of groups in the left panels, together with ARIs in the right panels. When , the gaCRP method reduces to the traditional CRP method, which always tends to over-cluster and shows smaller ARI than results by LPML method. We see that, when increases, the estimated number of groups firstly decreases and then increases from histograms in Figure 2 . The reason is that as , the graphical weight

for disconnected nodes becomes particularly small, which means only connected nodes can be classified into the same group, therefore leading to the over-cluster problem, as we discussed in Remark

2. The group concordance under optimal selected by LPML generally performs better than those of EM method, with the proportion of selecting true number of groups always greater than 80%.Example 1 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|

LPML | 0.626 | 0.471 | 0.179 | 0.377 | 0.057 | 0.108 | 0.264 | 0.532 | 0.330 | 0.624 |

EM | 1.706 | 0.233 | 0.164 | 0.167 | 0.132 | 0.123 | 0.514 | 0.279 | 1.450 | 1.466 |

2-step | 3.774 | 0.226 | 0.318 | 0.242 | 0.496 | 0.258 | 0.717 | 0.380 | 1.128 | 1.474 |

Example 2 | ||||||||||

LPML | 0.722 | 0.265 | 0.088 | 0.176 | 0.085 | 0.135 | 0.404 | 0.429 | 0.450 | 0.596 |

EM | 2.452 | 0.317 | 0.329 | 0.186 | 0.157 | 0.149 | 0.963 | 0.482 | 1.245 | 1.241 |

2-step | 3.191 | 0.370 | 0.329 | 0.282 | 0.508 | 0.371 | 0.956 | 0.726 | 1.188 | 1.309 |

Example 3 | ||||||||||

LPML | 0.920 | 0.644 | 0.158 | 0.180 | 0.093 | 0.089 | 0.460 | 0.318 | 0.443 | 0.246 |

EM | 1.942 | 1.416 | 0.300 | 0.288 | 0.110 | 0.120 | 0.748 | 0.573 | 1.183 | 0.576 |

2-step | 2.880 | 1.890 | 0.356 | 0.315 | 0.568 | 0.480 | 0.878 | 0.723 | 0.948 | 0.576 |

## 5 Empirical Case Studies

In this section, we conduct two empirical case studies based on the graphs as Examples 2–3 in Section 4. For the first real data example, we study the local government economic competition by using city-level fiscal revenue (FR) data. For the second real data example, we investigate the dynamic patterns of the stocks traded in the Shanghai and Shenzhen Stock Exchange. We first conduct a descriptive analysis of the real datasets. We calculate the average ratio of fiscal revenue to GDP from 2002 to 2019 and the average weekly stock return rate during 2019, as shown in Figure 3. The average FR/GDP ratio is 0.068 and the weekly return of the stocks is 0.43%.

Comments

There are no comments yet.