1 Introduction
Longitudinal bipartite relational data are being collected at unprecedented rates to study complex phenomena in both the social and biological sciences. These data characterize the evolution of relations between pairs of actors, where each actor is one of two distinct types and relations exist only between disparate actor types. Studies involving such data have focused on, e.g., films (Watts and Strogatz, 1998), international relations (Boulet et al., 2016), metabolic interactions (Jeong et al., 2000), recommender systems (Linden et al., 2003), transportation systems (Zhang et al., 2006), or international affairs (Campbell et al., 2018). For example, in international affairs, researchers might study countries’ financial contributions to international organizations over the past few decades. Here, the countries and international organizations are the actors and the relations of interest are yearly financial contributions.
In many studies of longitudinal bipartite relational data, the relevant scientific questions surround relations among actors of a single type; the set of which can be represented in a unipartite network. For example, researchers may be interested in the (unobserved) relationships among countries that affect the amount of financial contributions to different international organizations. The financial contribution of China to the UN, for instance, may be influenced by the US’s recent announcement to cut its budget obligations for 201819. The degree of change in China’s contribution based on the US could be viewed as a measure of US influence on China. Such influences may exist between international organizations as well: the contribution of the US to the UN can be related to its financial obligations to the WTO. In the examples given, the influences are occurring over time and are allowed to be asymmetric such that, for example, China may be influenced by the US to a high degree while the US is not influenced by China. A goal then in studying longitudinal bipartite data is often to infer the relationships among actors of each type. We term these sets of unipartite relations influence networks. For example, in the USChina illustration, the country influence network is denoted , where is the number of countries and represents the amount of influence country has on country . Similarly, the organization influence network is denoted , where is the number of international organizations and represents the influence of organization on organization . Inferring these latent influences are of substantive interest in many studies and, in the case of the states and their contributions to international organizations, have the potential to inform international policymaking in effective and previously unknown ways.
The idea of summarizing bipartite data in terms of unipartite influence networks is not new. Newman (2001) analyzes the relationships among academic authors by estimating the unipartite authorauthor network from data on academics and the papers they authored over five years. Newman (2001) ignores the temporal component of the data and defines the relationship between author and author as the the number of papers and coauthored during the fiveyear period. The resulting influence network is often referred to as a (onemode) projection. If the binary data matrix of authorship is denoted by a rectangular matrix , where is an indicator of whether academic authored paper , then the author influence network can be expressed as . Notice that is symmetric by construction and represents behavioral cooccurrence (in this case, coauthorship), rather than influences over time as described earlier. Investigating temporal patterns in publications, Barabási et al. (2002) estimated yearly influence networks among academic authors using onemode projections and analyzed the evolution of summary statistics of the yearly projections. Extensions of onemode projections exist for longitudinal bipartite networks (Wu et al., 2014), weighted bipartite networks (Newman, 2004; Liu et al., 2009), and for creating directed influence networks (Zhou et al., 2007). These various extensions involve different weightings of the original bipartite relations (e.g., weight each paper by the number of coauthors).
Due to the plethora of tools available for unipartite networks, bipartite data are often cast into onemode projections that can be subsequently analyzed using standard network modeling techniques (Zhou et al., 2007). Although various weighting schemes have been investigated for onemode projections (Wu et al., 2014), a key disadvantage of this approach is that information in the original bipartite data is inherently lost in the projection, regardless of the selected weighting scheme. In addition, since the data naturally arise in a bipartite format, specifying a generative model for the projection on which to base inference is fundamentally challenging.
In this paper, we propose a novel bipartite longitudinal influence network (BLIN) model, which permits inference on the influence networks for each set of actors. This work builds upon recent developments on statistical models for longitudinal unipartite relational data (Banks and Carley, 1996; Snijders, 2005; Krackhardt and Handcock, 2007; Sewell and Chen, 2015; Carnegie et al., 2015). Specifically, Almquist and Butts (2013, 2014)
propose an autoregressive model for unipartite networks that may be expressed as a generalized linear model. In a similar vein, we propose an autoregressive, generalized linear model for bipartite networks, wherein the influence networks are autoregressive parameters.
The rest of the article is organized as follows. We introduce the BLIN model in Section 2. We also discuss two existing approaches to modeling longitudinal bipartite data and explore various possible extensions to our model. We describe maximum likelihood estimation procedures for the BLIN model in Section 3 and give properties of the resulting estimators in Section 4, including performance under misspecification. In Section 5, we compare the performance of our model to existing approaches in simulation studies. Finally, in Section 6, we demonstrate our methodology using a data set of state interactions over time and conclude in Section 7.
2 BLIN Model
Let the matrix denote the observation of the bipartite relations among actors of one type (e.g., countries) and actors of a second type (e.g., international organizations), where the time index . For example, in the illustration introduced above, the entry in , , is the financial contribution by country to international organization in year . The BLIN model expresses the relations at time , , as a function of the previous relations , and the influence matrices and :
(1) 
where is the number of previous time periods upon which depends and is an matrix of mean zero, independent and identically distributed errors. In this formulation, it is the sum of the previous lag time period values that influence the current time period according to the influence networks and . This model can alternatively be expressed as:
(2) 
From this representation, it is more easily seen that is exclusively a function of those when either or , or both. Consider . In the context of the international affairs example, this means that, for example, China’s financial contribution to the UN in 2015 depends on the contributions of all other countries to the UN in 2014 through entries in , and on China’s financial contributions to all other international organizations in 2014 through the entries in . The interpretation of the individual and parameters follow from BLIN being a linear model. For example, is the expected increase in financial contributions of country to a given international institution when country has raised its contribution by one unit to the same organization in the previous year. Similarly, the coefficient is the expected increase in budget obligations to international organization by a given country when international organization has received one unit of financial contributions from the same country in the previous year. Figure 1 depicts these influences using the international affairs example. A positive value of in , corresponding to the influence of the US on China, means that if the US would increase their contributions to the UN in 2014, then China is expected to increase its expenditure to the UN in 2015. Similarly, a positive value of in , corresponding to influence of financial obligations to the UN on contributions to the WTO, implies that if the US spent more money on the UN in 2014, then it is expected to increase its WTO expenditure in 2015. Since the influence matrices are time invariant, the entries in and represent the average influence over all time periods under consideration.
Figure 1 depicts types of direct influence patterns the BLIN model captures (i.e. those between and where and/or ). Note, however, that secondary influences (such as that between and where , and ) may propagate through the BLIN model over multiple time periods, i.e. for . For example, although US contributions to the UN in 2014 may not affect China’s contribution to the WTO in 2015, it may do so in 2016. This may occur if, say, US financial transfers to the UN in 2014 affect China’s UN expenditure in 2015 via a nonzero value of . Then, China’s contribution to the UN in 2015 impact its own contribution to the WTO in 2016 through a nonzero value . In this way, the BLIN model allows both direct and secondary influences through different mechanisms.
Dependence  

Direct country ()  
Direct organization ()  
Secondary 
A key property of the BLIN model is that is may be written as a linear model. Letting and
denote the columnwise vectorization of matrices
and , respectively, the model in (2) can alternatively be expressed(3)  
(4) 
where ‘’ is the Kronecker product and denotes the columnwise vectorization of matrix . In the second line, is the matrix and is the vector of parameters. Since the BLIN model may be written as a linear model, numerous offtheshelf tools exist for estimation (including regularization) of the BLIN model, making inference on the influence networks straightforward. In what follows, we focus on the model without covariates, although we may easily incorporate covariate information by adding the term to the right hand side of (3), for and . Thus, we assume throughout the paper that is mean zero for all without loss of generality.
The BLIN model in (1) is not fully identifiable such that for any , the transformation results in the exact same model for the data . This nonidentifability means that we are unable to determine and separately, but that the sum is identifiable. Specifically, we may estimate the effect of on , but we cannot decompose this effect into the contribution of country and organization . Nevertheless, we may compare the marginal country effect among countries and among international organizations, respectively. For example, although the absolute values of and are not identifiable, their difference is identifiable. If this difference is positive, we may conclude, for example, that a US increase in financial contributions given a unit expenditure in the previous year is higher than China’s increase in contributions.
2.1 Comparison to Existing Approaches
Diffusion models (e.g., Berry and Berry, 1990) are popular for studying the interdependencies of institutions in political science (Desmarais et al., 2015). In these models, an outside institution puts transmission pressure on for a particular policy on the focal institution, making the latter more likely to adopt that policy. The network of these transmissions forms a directed tree, where there is at most one path from one institution to another. The diffusion model is distinct from the approach we propose in several ways. In the parlance of the international affairs example, the former supposes that each country’s financial contribution to a specific international organization is influenced by at most a single other country. In addition, a binary network is inferred, rather than a weighted network which can encode both positive and negative influences. Furthermore, methods for quantifying uncertainty in the estimated network and incorporating covariates are unavailable.
Hoff (2015) proposes a generative model for bipartite longitudinal data termed the bilinear model, which can be expressed
(5) 
Unlike the BLIN model, the bilinear model is nonlinear, which complicates estimation, regularization, and uncertainty quantification. The matrices and also measure actor influences, however, the bilinear model combines the direct and secondary dependencies in Figure 1 into the same mechanism. This results in a different interpretation of the parameters. To illustrate this, we rewrite the bilinear model in (5) as
(6) 
Here we see depends on every entry in , as may be affected by both (direct) and (secondary) through . A consequence of the multiplicative nature of the model is that the influence parameters must be interpreted in conjunction with one another. For example, represents the expected increase in Chinese contributions to international institution for each unit of expenditure of the US to organization in the previous year. Thus, the interpretations of the country influences in and the internationalorganization influences in are intertwined. While there are likely instances where is influenced by , we argue that secondary dependencies are often likely of a smaller magnitude than the direct dependencies. In these cases, it would be undesirable to use, for example, a single parameter to simultaneously capture direct and secondary influences. The BLIN model assumes secondary dependencies are zero and focuses estimation on the direct dependencies.
The influence matrices and in the bilinear model are identifiable up to a multiplicative constant. For any , the transformation leaves the model for invariant. This implies that the relative scales of the networks represented by and and the signs of the elements are not estimable. However, the ratio of elements within each influence network is identifiable, e.g. .
2.2 Extensions of the BLIN Model
In this section, we discuss various extensions of the BLIN model. First, in the definition of the BLIN model in (1), the first observation of the bipartite relations is a function of the number of past observations . For simplicity, each past observations affects the current observation equally. Obviously, more complex dependence functions may be considered, such as an exponentially decaying influence of the past time periods on the current time period. In general, in (1) may be replaced by any matrix function of the past observations, , where . The selection and estimation of such functions is a current area of research: see Krackhardt and Handcock (2007); Krivitsky (2009); Hanneke et al. (2010); Almquist and Butts (2014) for a discussion of general unipartite temporal network models and autoregressive models for unipartite temporal networks.
The linear nature of the BLIN model simplifies its extension to other types of outcomes, e.g. binary or count observations. Let be a general measure of the relation between actors and at time . Then, a general BLIN model may be expressed
(7) 
where is an appropriate link function based on the form of (McCullagh and Nelder, 1989). Offtheshelf tools are again available for estimation of the model in (7) if is a standard link function.
3 Estimation of the BLIN model
In the following, we discuss several estimation procedures for the BLIN model. First, we propose an estimator that results from minimizing a least squares criterion. We then consider more parsimonious estimators using sparsityinducing penalties and reducedrank approaches. For ease of notation, we define the regressor matrix in (1) as such that the BLIN model can be expressed
(8) 
3.1 Least Squares Estimator
Based on the vector representation of the BLIN model in (3), we propose minimizing the following least squares criterion to construct an estimator for the and matrices:
(9)  
(10) 
where such that and is the columnwise stacking of the design matrices in the vector representation of the BLIN model in (4). The explicit solution to (9) is
(11) 
where denotes the generalized inverse of square matrix .
Computing the solution in (11) requires inversion of a matrix of dimension . By using an iterative algorithm to solve (9), this computation can be replaced by repeated inversions of square matrices of dimensions and . Specifically, Algorithm 1 details a block coordinate descent procedure, which alternates between solving for and . When either or is large relative to the other, there will be significant reduction in memory demand by performing this iterative estimation scheme. The estimator in Algorithm 1 converges to a unique minimum when and are full rank.
3.2 Sparse Coefficients
Although influence may be multifactorial, it is easy to imagine scenarios where many entries in and are small or zero. In the international affairs example, democracies may only influence other democracies, or organizations dealing with issues at a global level may only be influenced by other global institutions. To leverage this fact in parameter estimation, we propose augmenting the leastsquares criterion of the of the BLIN model in (9) with a sparsityinducing penalty. Here, we consider the Lasso penalty (Tibshirani, 1996), which uses an norm on to simultaneously perform variable selection and regularization. We term this model the sparse BLIN model as elements of are forced to the zero. The estimation objective function is
(12) 
where is a tuning parameter, and larger values of correspond to more regularization. Since the BLIN model is linear, the vector of parameters may be regularized with any of a host of existing penalty terms (Hoerl and Kennard, 1970; Friedman et al., 2001).
3.3 ReducedRank Coefficients
Thus far we discussed sparsityinducing penalties for the vector of regression model coefficients
. However, several other penalties on the singular value decomposition (SVD) of coefficient matrices
and have been proposed. These penalties result in coefficient estimates of and with reducedrank, or approximately reducedrank. Yuan et al. (2007) propose a nuclear norm penalty, which is an penalty of the singular values of . Similarly, Bunea et al. (2011) recommend a rank selection criteria penalty that is proportional to the rank of , i.e. a penalty on the singular values . This second approach provides simultaneous shrinkage on and consistent estimation of its (reduced) rank.Here we consider estimation of reducedrank and , i.e., and . This assumption may be appropriate when there is lowerdimensional structure inherent in and : for example, the influences among countries may be grouped by region. This approach is employed in reducedrank regression, first developed by Anderson (1951)
, and has connections to principal component analysis, as shown in
Izenman (1975). For any ranks and of and , respectively, we may define a reducedrank BLIN model by writing for and for . These decompositions are not identifiable up to a fullrank transformation of the decompositions, e.g. and result in the same influence matrixfor any invertible matrix
. However, the estimands and remain identifiable up to an additive constant along the diagonal, as discussed in Section 2.We estimate a reducedrank BLIN model by minimizing the leastsquares criteria in (10) with replaced by , replaced by , and minimizing over . This optimization problem is easily solved using a block coordinate descent algorithm (see Section B of the supplementary material for details). In what follows, we refer to the BLIN model with no constraints on the parameters and as the full BLIN model, in order to distinguish it from the sparse and reducedrank versions.
4 Estimator Properties
This section examines properties of the least squares estimators of the full and reducedrank BLIN models (all proofs are provided in Section C of the supplementary material). We show that the least squares estimator are unique for a relatively small number of observations and give some sufficient conditions for their asymptotic normality and efficiency. We then examine the properties of the least squares estimators under misspecification, providing sufficient conditions for their consistency. As in the previous section and the representation of the BLIN model in (8), we use to denote the autoregressive term .
4.1 Uniqueness and Efficiency of Least Squares Estimators
The optimization problem described in (9) is convex, and by the GaussMarkov theorem, it has a unique solution whenever is full rank (see, e.g., Graybill, 1976). Due to the nonidentifiability of the diagonal entries of and , is never full rank. However, the projection of onto the column space of , i.e., , is always unique. Thus, when the column space of spans the space of possible and matrices up to their nonidentifiability, the BLIN estimator in (11) is unique (up to the nonidentifiability properties). This is true when the rank of is maximized, that is one less than the number of columns: . The ‘’ results from the fact that if a single diagonal entry in or is known, then the rest of the diagonal entries are identifiable. We now provide a proposition that states conditions under which
is maximal rank; these conditions are satisfied with probability one when, e.g.,
is array normal.Proposition 1.
Without loss of generality, take . Assume that the matrix formed by the columnwise concatenation is full rank. Then, the design matrix has
A consequence of Prop. 1 is that the full BLIN model has a unique solution when . A key implication of this is that a unique solution exists for relatively small . For instance, if , then the BLIN model has a unique solution (modulo the nonidentifiability) when . Figure 2 plots values of , , and for which the full BLIN model has a unique solution. As grows, the space of values for which the BLIN model has a unique solution rapidly spans all values of and , except when their values are extremely disparate.
Under the BLIN model in (1) when is independent , the GaussMarkov theorem (Graybill, 1976) states that the BLIN estimator , where is any solution to (9) and defines any identifiable linear combination of
, is the best linear unbiased estimator (BLUE) of
. Additionally, these estimates are asymptotically normal at rate . Finally, we note that the estimator in (9) is the maximum likelihood estimator whenis normally distributed with homogeneous variance. Thus, under regularity conditions
(Lehmann and Casella, 2006), the limiting normal random variable for
, has minimum asymptotic variance.Recall that the least squares estimates of the reducedrank BLIN model can be obtained using an iterative blockcoordinate descent algorithm (see Section B of the supplementary material). When every matrix inverse in the update equations is unique, the estimates and from this algorithm converge to a local minimum, which is unique up to nonidentifiabilities in and provided that . The reducedrank estimators are the maximum likelihood estimators for and with ranks and , respectively, under the assumptions of normally distributed independent errors with homogeneous variance. Therefore, under these conditions, and resulting from the iterative blockcoordinate descent algorithm are consistent and asymptotically normal with minimum asymptotic variance.
4.2 Least Squares Estimator Properties under Misspecification
Thus far, we have discussed the attractive properties of the least squares estimators of the BLIN model, and , under the assumption that the model is correctly specified. It is useful to determine the limiting values of the estimators when the model is misspecified. The limiting values of and , which we denote and , respectively, are referred to as pseudotrue parameters in the model misspecification literature, first investigated by Huber (1967). The pseudotrue parameters, by definition, are
(13) 
where has expectation , some general function of , and the expectation in (13) is over . Note that this expression holds regardless of the distribution of and and the form of . We note that the limiting values and
minimize the KullbackLeibler divergence from the true distribution of
to the distribution of under the BLIN model with Gaussian errors.In this section, we assume that the variancecovariance matrix of is Kroneckerstructured, that is for mean zero. This is the case if, for example, is distributed matrix normal (Gupta and Nagar, 2000) and is consistent with the theoretical treatment of the bilinear model in Hoff (2015). We also assume that the errors are additive. Below, we formalize the conditions for the theory to follow.
Conditions 1.

Each is identically distributed with mean zero and , for positivedefinite matrices and with finite entries.

For all , , where is a general function of and every entry in is independent and identically distributed with homogeneous, finite variance.
In the remainder of this section, we examine some properties of the pseudotrue parameters (all proofs are provided in Section C of the supplementary material). The entry in , denoted , estimates the linear relationship between row of and row of across all time. Similarly, the entry in , denoted , estimates the linear relationship between column of and column of across all time. The first proposition states that when there is no linear relationship, the appropriate pseudotrue parameter is zero. We denote row of as and column of as , and do the same for .
Proposition 2.
Under Conditions 1, if is diagonal and for all , then = 0. Alternatively, if is diagonal and for all , then = 0.
In the setting of the international policy example, Proposition 2 states that if country ’s expenditure is uncorrelated with country ’s contribution to the same organization in the following year, then the least squares estimator of in the BLIN model will converge to .
The following proposition provides alternative conditions under which the pseudotrue parameters are equal to zero. It allows for more general covariance structure at the cost of assuming that the conditional mean of is linear in .
Proposition 3.
Assume that there exists a linear relationship for all and Conditions 1 hold. If all entries in relating row in to row of are zero and , then = 0. If all entries in relating column in to column of are zero and , then = 0.
4.3 Comparison of BLIN and Bilinear Least Squares Estimators
We now compare least squares estimates of the BLIN model to those of the bilinear model. Both models aim to quantify the influences of the rows (columns) of on the rows (columns) of , but with different emphasis on the type of influence quantified (recall the discussion in the Introduction and Section 2.1). We provide a theorem and proposition stating that, when data are generated from the bilinear model, the least squares BLIN estimators of the offdiagonal entries in and converge to the corresponding and values used in the bilinear generative model, up to numerical constants. The analogous result holds when switching the roles of the bilinear and BLIN models. See Section C of the supplementary material for proofs.
Theorem 4.
We now address the diagonals of the and matrices. We provide conditions under which the diagonals (up to their nonidentifiabilities) are asymptotically equivalent. To be clear, we compare the estimated influence of on from the two models. Under the BLIN model, this influence is , whereas under the bilinear model the influence is .
Proposition 5.
Suppose the and matrices are constant along the diagonal with nonzero values and , respectively, and . Then, under Conditions 1,

If are generated from the bilinear model in (5), then the psuedotrue parameter of leastsquares estimation of the BLIN model is equal to the true diagonal specification , and

If are generated from the BLIN model in (1), then the psuedotrue parameter of leastsquares estimation of the bilinear model does not equal the true diagonal specification in general.
The conditions for equivalence of the diagonals are more restrictive than those for the equivalence of the offdiagonal elements of and . Furthermore, we see the BLIN model diagonals are consistent in misspecification situations when the bilinear diagonals are not consistent. In Section D.2 of the supplementary material, we evaluate the convergence of the bilinear and full BLIN estimators in support of Theorem 4 and Proposition 5.
5 Simulation Study
We evaluate the performance of the least squares estimators of the sparse, reducedrank, and full BLIN models, in addition to the bilinear model, using simulations. We fix the size of and at and , respectively, and vary the number of observations . The errors are independent and identically distributed Gaussian realizations. See Section D of the supplementary material for additional details.
We evaluate the performance of the estimators using fold cross validation for , where each fold consists of a subset of the time periods. This means that for , each fold is a separate time period. We evaluated the insample and outofsample for each fold and averaged the results across folds for each data set. We present the range of results for 1,000 realizations of each generating model in Figure 3.
Average 


Bilinear  BLIN  
Average 

BLIN, rank 2  BLIN, sparse 
Figure 3 confirms that the best estimator for each generating model is the estimator that corresponds to the true model, as expected. We also observe a known result in model selection: using insample alone for model selection may lead to selection of a suboptimal model. For example, the bilinear estimator has the highest insample when data are generated from the sparse BLIN model, yet, the bilinear estimator has the lowest outofsample values in this case. In general, the bilinear estimator appears highly variable and tends to have lower median outofsample values than the other estimators, even when the true generating model is bilinear. We posit that some of the observed variability in the performance of the bilinear estimator is a result of the bilinear model likelihood possessing multiple optima for moderate values of , making it difficult for any algorithm to find the global optimum.
All BLIN estimators perform poorly outofsample when the true model is bilinear, and similarly, the bilinear estimator performs poorly outofsample when the true model is BLIN. This suggests that these models represent fundamentally different phenomena and neither is a good substitute for the other in prediction. Lastly, we note that the sparse BLIN estimator appears most robust to misspecification – it is the only estimator with all positive outofsample values. This is as expected, since the regularization in the sparse BLIN estimator guards against poor outofsample performance.
6 Temporal State Interaction Data Analysis
We analyze weekly state interactions derived from the Integrated Crisis Early Warnings System (ICEWS) data set (Boschee et al., 2017). The data set consists of four relation types among 25 states at weekly intervals from 2004 to mid2014, giving 543 weeks of data. The relation types are material negative actions (mn), material positive actions (mp), verbal negative actions (vn), and verbal positive actions (vp). Two examples of material positive interactions are military cooperation and the provision of humanitarian aid.
After an initial investigation of model fits, we chose to difference the responses and model the weekbyweek increase of mn, mp, vn, and vp interactions, as well as set . The BLIN model for this analysis may be written:
(14) 
See Section E of the supplementary material for details on the choice of lag and a differenced response.
We fit the sparse BLIN estimator, focusing on the material positive response (mp); the estimates for the remaining responses are given in Section F of the supplementary material. In Figure 4, we depict both the estimated sender () and receiver () influence networks, using the ISO3 naming convention. In general, we see that the positive entries in and are much larger in absolute value than the negative entries, suggesting that positive influence is more consequential than negative influence. This implies that countries look to those they wish to emulate when changing their aid status, rather than acting counter to the countries whose actions they reject: China looks to emulate the actions of Japan (positive influence) more than repudiate the actions of the US (negative influence). Further, the values in appear slightly larger than those in , suggesting that sender influence is more consequential than receiver influence. This implies that, when the US changes its aid status to Iraq, this decision is more likely influenced by countries that aided Iraq in the past than the previous aid sent by the US to countries that influence Iraq. Finally, we see that China, Japan, and the US share the most edges with other nations. This result is intuitive as we expect great powers to possess the most influence relationships and the most substantively important influence relationships (Morgenthau, 1948; Waltz, 1979; Mearsheimer, 2001).
In examining particular relations, we observe phenomena that are equally intuitive. For example, one of the largest positive influences in the network is of the US on Iraq. The interpretation of this value is that if the US is observed to increase material positive cooperation with another country, then we expect Iraq to raise its material positive cooperation with the same nation in the following two weeks. This certainly makes sense, following the USled removal of Saddam Hussein from power in 2003, the US and Iraq overhauled their bilateral relations and have institutionalized the coordination of foreign policy actions through the Strategic Framework Agreement (SFA). This finding is important as it provides evidence that the SFA has effectively promoted foreign policy coordination. Similarly, for negative entries in the network, if the UK increases material cooperation with a given nation, then we expect Afghanistan to decrease material cooperation with the same state in the following two weeks. A similar interpretation holds for the receivers of cooperation in the influence network. For example, if Israel observes an increase in material positive cooperation from a given nation, we expect the US to observe an increase in material positive cooperation from the same nation in the following two weeks. Given the formal and informal alignment of the US and Israel, it is expected that any country interacting with Israel may be more likely to also interact with the US (Li et al., 2017).
7 Discussion
In this paper, we present the Bipartite Longitudinal Influence Network (BLIN) model, which is a new generative model for the evolution of bipartite relational data over time. The BLIN model allows for the estimation of the weighted and directed influence networks among each of the two actor types in the bipartite data. The BLIN model can be expressed as a generalized linear model, lending itself to use with a litany of offtheshelf tools for estimation. Lastly, the influence network entries quantify direct influences and their absolute values and signs are identifiable.
An extension of the model to multipartite data – i.e., when there are three or more distinct actor types in the data set – may be readily accomplished (see Section A
of the supplementary material). For example, we might consider countries’ financial contributions to international organizations over time, and classify the type of contribution: for example, general budget obligations, employee salaries, or contributions to peacekeeping missions. In this model, the three actor types are countries, international institutions, and financial contributions’ categories.
In the BLIN model,
the influence networks and may be interpreted as “average” influence over the entire data set. There are scenarios where the influences among countries may evolve over time. For example, major international trade agreements or wars might change the structures of the influence networks. To account for temporal evolution of and , we might consider testing whether timedependent influence networks are necessary and estimating changepoints in the influence networks. Of course, this requires the development of a more flexible model with timevarying influence networks, such as modeling and as linear functions of know covariates as in Minhas et al. (2017).
Acknowledgements This work was partially supported by the National Science Foundation under Grant Number SES1461493 to Cranmer and Grant Number SES1461495 to Fosdick. This work utilized the RMACC Summit supercomputer, which is supported by the National Science Foundation (awards ACI1532235 and ACI1532236), the University of Colorado Boulder, and Colorado State University. The RMACC Summit supercomputer is a joint effort of the University of Colorado Boulder and Colorado State University.
References

Almquist and Butts (2013)
Almquist, Z. W. and Butts, C. T. (2013).
Dynamic network logistic regression: A logistic choice analysis of interand intragroup blog citation dynamics in the 2004 us presidential election.
Political Analysis, 21(4):430–448.  Almquist and Butts (2014) Almquist, Z. W. and Butts, C. T. (2014). Logistic network regression for scalable analysis of networks with joint edge/vertex dynamics. Sociological Methodology, 44(1):273–321.
 Anderson (1951) Anderson, T. W. (1951). Estimating linear restrictions on regression coefficients for multivariate normal distributions. The Annals of Mathematical Statistics, pages 327–351.
 Banks and Carley (1996) Banks, D. L. and Carley, K. M. (1996). Models for network evolution. The Journal of Mathematical Sociology, 21(12):173–196.
 Barabási et al. (2002) Barabási, A.L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., and Vicsek, T. (2002). Evolution of the social network of scientific collaborations. Physica A: Statistical Mechanics and its Applications, 311(3):590–614.
 Berry and Berry (1990) Berry, F. S. and Berry, W. D. (1990). State lottery adoptions as policy innovations: An event history analysis. American Political Science Review, 84(2):395–415.
 Boschee et al. (2017) Boschee, E., Lautenschlager, J., O’Brien, S., Shellman, S., Starz, J., and Ward, M. (2017). ICEWS coded event data.
 Boulet et al. (2016) Boulet, R., BarrosPlatiau, A. F., and Mazzega, P. (2016). 35 years of multilateral environmental agreements ratifications: A network analysis. Artificial Intelligence and Law, 24(2):133–148.
 Bunea et al. (2011) Bunea, F., She, Y., and Wegkamp, M. H. (2011). Optimal selection of reduced rank estimators of highdimensional matrices. The Annals of Statistics, pages 1282–1309.
 Campbell et al. (2018) Campbell, B. W., Marrs, F. W., Tobias Böhmelt, T., Fosdick, B. K., and Cranmer, S. J. (2018). Latent influence networks in global environmental politics. Under review.
 Carnegie et al. (2015) Carnegie, N. B., Krivitsky, P. N., Hunter, D. R., and Goodreau, S. M. (2015). An approximation method for improving dynamic network model fitting. Journal of Computational and Graphical Statistics, 24(2):502–519.
 De Lathauwer et al. (2000) De Lathauwer, L., De Moor, B., and Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications, 21(4):1253–1278.
 Desmarais et al. (2015) Desmarais, B. A., Harden, J. J., and Boehmke, F. J. (2015). Persistent policy pathways: Inferring diffusion networks in the American states. American Political Science Review, 109(2):392–406.
 Friedman et al. (2001) Friedman, J., Hastie, T., and Tibshirani, R. (2001). The Elements of Statistical Learning. Springer Series in Statistics, Berlin.
 Graybill (1976) Graybill, F. A. (1976). Theory and Application of the Linear Model. Duxbury Press, North Scituate, Mass.
 Gupta and Nagar (2000) Gupta, A. K. and Nagar, D. (2000). Matrix variate distributions. Chapman & Hall/CRC monographs and surveys in pure and applied mathematics ; 104. Chapman & Hall, Boca Raton, FL.
 Hanneke et al. (2010) Hanneke, S., Fu, W., Xing, E. P., et al. (2010). Discrete temporal models of social networks. Electronic Journal of Statistics, 4:585–605.
 Hoerl and Kennard (1970) Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67.
 Hoff (2011) Hoff, P. D. (2011). Separable covariance arrays via the Tucker product, with applications to multivariate relational data. Bayesian Analysis, 6(2):179–196.
 Hoff (2015) Hoff, P. D. (2015). Multilinear tensor regression for longitudinal relational data. The Annals of Applied Statistics, 9(3):1169–1193.
 Huber (1967) Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, volume 1, pages 221–233.

Izenman (1975)
Izenman, A. J. (1975).
Reducedrank regression for the multivariate linear model.
Journal of Multivariate Analysis
, 5(2):248–264.  Jeong et al. (2000) Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N., and Barabási, A.L. (2000). The largescale organization of metabolic networks. Nature, 407(6804):651.
 Kolda and Bader (2009) Kolda, T. G. and Bader, B. W. (2009). Tensor decompositions and applications. SIAM Review, 51(3):455–500.
 Krackhardt and Handcock (2007) Krackhardt, D. and Handcock, M. S. (2007). Heider vs Simmel: Emergent features in dynamic structures. In Statistical Network Analysis: Models, Issues, and New Directions, pages 14–27. Springer.
 Krivitsky (2009) Krivitsky, P. N. (2009). Statistical models for social network data and processes. PhD thesis, University of Washington.
 Lehmann and Casella (2006) Lehmann, E. L. and Casella, G. (2006). Theory of Point Estimation. Springer Science & Business Media.
 Li et al. (2017) Li, W., Bradshaw, A. E., Clary, C. B., and Cranmer, S. J. (2017). A threedegree horizon of peace in the military alliance network. Science advances, 3(3):e1601895.
 Linden et al. (2003) Linden, G., Smith, B., and York, J. (2003). Amazon.com recommendations: Itemtoitem collaborative filtering. IEEE Internet Computing, 7(1):76–80.
 Liu et al. (2009) Liu, J., Shang, M., and Chen, D. (2009). Personal recommendation based on weighted bipartite networks. In Sixth International Conference on Fuzzy Systems and Knowledge Discovery, volume 5, pages 134–137. IEEE.
 MadridPadilla and Scott (2017) MadridPadilla, O. H. and Scott, J. (2017). Tensor decomposition with generalized lasso penalties. Journal of Computational and Graphical Statistics, 26(3):537–546.
 McCullagh and Nelder (1989) McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, volume 37. CRC press.
 Mearsheimer (2001) Mearsheimer, J. J. (2001). The Tragedy of Great Power Politics. WW Norton & Company.
 Minhas et al. (2017) Minhas, S., Hoff, P. D., and Ward, M. D. (2017). Influence networks in international relations. arXiv preprint arXiv:1706.09072.
 Molstad and Rothman (2018) Molstad, A. J. and Rothman, A. J. (2018). A penalized likelihood method for classification with matrixvalued predictors. Journal of Computational and Graphical Statistics, (0):0–0.
 Morgenthau (1948) Morgenthau, H. (1948). Politics among nations: The struggle for power and peace.
 Newman (2001) Newman, M. E. (2001). The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98(2):404–409.
 Newman (2004) Newman, M. E. (2004). Analysis of weighted networks. Physical Review E, 70(5):056131.
 Sewell and Chen (2015) Sewell, D. K. and Chen, Y. (2015). Latent space models for dynamic networks. Journal of the American Statistical Association, 110(512):1646–1657.
 Snijders (2005) Snijders, T. A. (2005). Models for longitudinal network data. Models and Methods in Social Network Analysis, 1:215–247.
 Takeuchi et al. (2017) Takeuchi, K., Kashima, H., and Ueda, N. (2017). Autoregressive tensor factorization for spatiotemporal predictions. In 2017 IEEE International Conference on Data Mining (ICDM), pages 1105–1110. IEEE.
 Tibshirani (1996) Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288.
 Tucker (1964) Tucker, L. R. (1964). The extension of factor analysis to threedimensional matrices. Contributions to Mathematical Psychology, 110119.
 Waltz (1979) Waltz, K. N. (1979). Theory of International Politics. Waveland Press.
 Watts and Strogatz (1998) Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘smallworld’ networks. Nature, 393(6684):440.
 Wu et al. (2014) Wu, T., Yu, S.H., Liao, W., and Chang, C.S. (2014). Temporal bipartite projection and link prediction for online social networks. In Big Data, 2014 IEEE International Conference on, pages 52–59. IEEE.

Yuan et al. (2007)
Yuan, M., Ekici, A., Lu, Z., and Monteiro, R. (2007).
Dimension reduction and coefficient estimation in multivariate linear regression.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3):329–346.  Zhang et al. (2006) Zhang, P.P., Chen, K., He, Y., Zhou, T., Su, B.B., Jin, Y., Chang, H., Zhou, Y.P., Sun, L.C., Wang, B.H., et al. (2006). Model and empirical study on some collaboration networks. Physica A: Statistical Mechanics and its Applications, 360(2):599–616.
 Zhou et al. (2007) Zhou, T., Ren, J., Medo, M., and Zhang, Y.C. (2007). Bipartite network projection and personal recommendation. Physical Review E, 76(4):046115.
Appendix A Multipartite Relational Data
We primarily discuss the setting where is a matrix, equivalently a 2mode array, with replications over time, such that may be considered a tensor, or a 3mode array. However, the BLIN model is easily extendable to general mode arrays, which have been studied using, e.g., principal component analysis and linear discriminant analysis (MadridPadilla and Scott, 2017; Molstad and Rothman, 2018; Takeuchi et al., 2017). A mode BLIN model is appropriate when there are more than two actor types in the relational dataset, which may be termed multipartite as opposed to bipartite. In the example pertaining to contributions to international organizations, we might consider expenditure type (such as general budget obligations, employee salaries, or contributions to peacekeeping missions) in addition to countries and international organizations. In this setting, is a 3mode array with entries which measure of country ’s contribution to international organization in manner in year . Thus, each mode pertains to the country, the organization, and the financial contribution type, respectively.
Our motivation is inference of the influence networks of countries, international organizations, and contribution types, which we term for . As discussed in Section 2, when , the component of reflects the influence of the “slice” of on the slice of , where the slice is along mode . For example, the entry of estimates the influence of on , when controlling for row and column dependencies between and . In the example above, this entry characterizes the influence of contribution type on contribution type when controlling for country and internationalorganization influences.
We lean on the Tucker (1964) product and related results to write the array BLIN model for general mode arrays (see also De Lathauwer et al., 2000; Kolda and Bader, 2009; Hoff, 2011). First, we rewrite the BLIN model of (1) using the Tucker product notation as
(15) 
where the array , containing matrices, is of dimension (and the same concatenating for array ). Now let and be arrays observed over time periods. To form the full model, we concatenated the arrays such that and are of dimension . Then, the BLIN model is as follows:
(16)  
(17) 
where each is an matrix of coefficients and and are the vectorizations of and , respectively.
We may also write matrix versions of the BLIN model for both the bipartite and multipartite cases, i.e. for all . The matrix versions of these models are useful for generating estimating procedures, such as the block coordinate descent of Algorithm 1. Let be a general array of dimensions . The mode1 matricization of the array is defined as an matrix and if , then . The following mode1 and mode2 matricizations of the BLIN model are then
(18)  
(19) 
The mode matricization of the mode BLIN model in (16) may be written as
(20)  
Appendix B Least Squares Estimation of ReducedRank BLIN Model
In this section, we provide a procedure for obtaining the maximum likelihood estimator of the BLIN model assuming reduced rank coefficient matrices and , where the respective ranks are known. The loglikelihood of the data is simply the sum of the loglikelihood at each time period since we assume each error observation is independent every other. The loglikelihood, in terms of the unknown matrices, is proportional to
(21) 
where is a rank decomposition of , and is a rank decomposition of .
We introduce a block coordinate descent algorithm in Algorithm 2 to obtain the maximum likelihood estimator of the unknown matrices . Algorithm 2 estimates each unknown matrix in turn until convergence, in an analogous procedure to the estimation of the full BLIN model in Algorithm 1. The update equation for each unknown matrix is derived by differentiating (21) with respect to , e.g., setting this derivative to zero, and solving for .

Set threshold for convergence . Set number of iterations . Initialize with independent standard normal entries and .

Compute

Compute

Compute

Compute

Compute the least squares criterion
If , increment and return to 1.
Appendix C Proofs of Theoretical Results
We begin with the proof of Proposition 1. We then prove Theorem 4, as expressions in this proof support the proofs of Propositions 2 and 3. We take and mean zero without loss of generality.
Proof of Proposition 1
.
Noting that is of dimension , it is sufficient to show that is full rank under the assumptions. We treat two cases: (1) and (2) .
Case (1):
We first show that implies .
Assume towards a contradiction that and let for . Then,
(22)  
(23)  
(24) 
If , then . As there is no integer that satisfies (24) and , we have that .
Now, as , we have that . Assume towards a contradiction that . Then, for some nonzero that . Consider the columns through of . The assumption implies that, for some nonzero , that . For , this is a contradiction of the assumption that is full rank. Thus, we have that is full rank in case (1).
Case (2):
Now we take , such that
Comments
There are no comments yet.