In many relational contexts, a set of events is observed, with each event involving an arbitrary number of actors and even a single actor. These events give rise to a so-called affiliation network in which there are two types of node: actors and events. Following the current literature, see Wang et al. (2009) among others, we refer to this structure as bipartite network, also known as two-mode network, contrarily to the one-mode network having a unique type of nodes. An example, which motivates the present paper, is that of academic articles in top statistical journals (Ji and Jin, 2017), typically involving more than two authors but that might also be written by a single researcher. In these applications, the interest is in studying the relations between units, with the aim of modeling separately the tendency of each unit to be involved in an event, and the tendency of each pair of units to cooperate. We are also interested in studying the time evolution of social behaviors and thus the dynamics over time of both these tendencies. The dataset of statistical publications we aim to analyze has also a particular feature that is important for the following developments: the events are only partially ordered as we know the year of publication of each article, but there is not any sensible way to temporally order articles published in the same year.
The literature on bipartite networks is mainly based on models having characteristics similar to those for one-mode networks in which direct connections are observed between certain pairs of actors, such as Exponential Random Graph Models (ERGMs; Frank and Strauss, 1986; Wasserman and Pattison, 1996); for a review see Snijders (2011) and Amati et al. (2018). One of the first model for the analysis of bipartite networks is proposed in Iacobucci and Wasserman (1990)
and is based on an ERGM structure with specific effects for both types of node (i.e., actors and events) and strong assumptions of independence between the response variables. This approach was extended in several directions bySkvoretz and Faust (1999), whereas Wang et al. (2009)
presented a flexible class of ERMGs for bipartite networks and related estimation methods. More recently, a review of models for this type of networks has been illustrated byAitkin et al. (2014), including certain versions of the Rasch (1967) model and the latent class model (Goodman, 1974). The approach proposed in the present paper is also related to models for the analysis of longitudinal one-mode networks, such as actor-oriented models (Snijders and van Duijn, 1997; Snijders et al., 2010), dynamic ERGMs (Robins and Pattison, 2001)2011; Matias and Miele, 2016; Bartolucci et al., 2018), and the models for relational events described in DuBois et al. (2013), Perry and Wolfe (2013), Butts and Marcum (2017), Stadtfeld et al. (2017), Fox et al. (2016), and Xia et al. (2016).
For the analysis of bipartite networks, and in particular of the dataset of publications is top statistical journals (Ji and Jin, 2017)
, we represent each event by a vector of response variables, with equal to 1 if unit is involved in event and 0 otherwise. Our aim is to directly formulate a statistical model for the response vectors having a meaningful interpretation. In particular, we rely on a marginal model (Bergsma and Rudas, 2002a; Bergsma et al., 2009)
based on first- and second-order effects. The first-order effects correspond to the logit of the marginal distribution of eachvariable and represent the general tendency of actor to be involved in event . The second-order effects are the log-odds ratios (Agresti, 2013, Ch. 2) for the marginal distribution of each pair of variables , representing the tendency of actors and to be jointly involved in the same event . However, as we show in detail in the sequel, this parameter may be directly interpreted as the tendency of and to cooperate. Moreover, even if we do not directly consider higher order effects, we do not pose any restrictions on these effects. At least to our knowledge, the use of marginal models for the analysis of social network data is new in the statistical literature.
Second, we pay particular attention to the parametrization of the above effects so as to account for the time evolution, and represent individual trajectories in terms of tendency to participate in an event and tendency to cooperate. This feature is common in latent growth models (Bollen and Curran, 2006); however, in the proposed approach we use individual fixed parameters, rather than random parameters, applied to polynomials of time of suitable order. Then, the proposed approach is particularly appropriate when the interest is in the evaluation of the behavior of a single actor in terms of the tendencies mentioned above. The possibility to estimate fixed parameters is possible thanks to the amount of information that is typically huge in the applications of interest. For instance, in the motivating example, there are more than three thousand papers that play the role of events.
. In particular, we use a likelihood function based on the marginal distribution of every ordered pair of actors. For each of these pairs, the likelihood component directly depends on the first- and second-order effects described above, and on individual parameters referred to the two actors. Then, to maximize the target function, we propose a simple iterative algorithm withcomplexity, that is thus computationally tractable even with if the number of actors is large. This is an important feature given the large scale of nowadays social network data; see also the discussion in Vu et al. (2013).
Forth, in presence of many statistical units, we show how to cluster units in groups that are homogenous in terms of tendency to be involved in an event or tendency to cooperate with other units. For this aim, we rely on a classification composite likelihood function that is related to that used for estimating the individual fixed parameters. This allows us to represent trajectories referred to homogeneous groups, rather than to individuals, so as to simplify the interpretation of the evolution of the data structure and of the social perspective of the phenomenon under study.
The paper is organized as follows. In the next section we describe assumptions and interpretation of the proposed approach. In Section 3 we outline the method of inference based on the use of fixed effects and clustering techniques. The application is illustrated in Section 4. In the last section we draw main conclusions and outline some possible extensions, as the inclusion of third-order effects and of individual covariates.
The estimation algorithm is implemented in a series of R functions that we make available to the reader upon request.
2 Proposed model
Let denote the number of actors and the number of observed relational events. Also let be a binary outcome equal to 1 if the relational event involves unit and to 0 otherwise, with and . These variables are collected in the column vector , a generic configuration of which is denoted by . Moreover, let be a binary variable equal to 1 if units and are involved in event , and to 0 otherwise. Note that
so that the set of variables is function of the set of variables ; the vice-versa does not hold, confirming that the direct analysis of the leads, in general, to an information loss. About this point see also the discussion in Aitkin et al. (2014).
2.1 Marginal effects
The main issue is how to parametrize the distribution of the random vectors . We adopt a marginal parametrization (Bergsma and Rudas, 2002a; Bergsma et al., 2009) based on hierarchical effects up to a certain order. This parametrization is less common than the log-linear parametrization, adopted even in ERGMs (Frank and Strauss, 1986; Wasserman and Pattison, 1996), in which
for all configurations different from the null configuration , where is a vector-valued function depending on .
Indeed, a marginal parametrization may be expressed on the basis of a sequence of log-linear parametrizations referred to the marginal distribution of selected subset of variables, that is,
where is the set of indices of such variables and is the corresponding subvector of .
In our approach, in particular, we rely on first- and second-order effects, specified, for all , as
which is a particular case of (2) with , and
which is obtained from (2) with . In terms of interpretation (see also Bartolucci et al., 2007), we can easily realize that the marginal logit is a measure of tendency of unit to be involved in the -th relational event. On the other hand, the log-odds ratio is a measure of the tendency of units and to cooperate with reference to the same -th relational event.
To better interpret the effects, it is worth recalling that the log-odds ratio is a well-known measure of association between binary variables (Agresti, 2013, Ch. 2), being 0 in the case of independence. In fact, an alternative expression for this effect is
corresponding to the increase in the logit of the probability that unit(or ) is involved in the -th event, given that unit (or ) is present in the same event, with respect to the case the latter is not present. More details in this regard are provided in the following section.
Before illustrating how we parametrize in a parsimonious way the marginal effects defined above, it is worth recalling that, apart from the trivial case of actors, the knowledge of these effects is not sufficient to obtain univocally the corresponding distribution of the vectors . To formulate this argument more formally, let denote the vector containing the joint probabilities for all possible configurations in lexicographical order. Also let be the vector containing the first-order effects for and let denote the corresponding vector of second-order effects for and . It is possible to prove that
for a suitably defined matrix of contrasts and a marginalization matrix with elements equal to 0 or 1. However this relation is not one-to-one, in the sense that it is not possible to obtain a unique probability vector starting from .
In order to have an invertible parametrization, the structure of higher order effects must be specified. Just to give the idea, a third-order marginal effect between units , , and may be defined as
This is the difference between the conditional log-odds ratio for units and given that unit is present with respect to the case it is not present. This directly compares to the triangularization effect in an ERGM, as it measures how much the presence of unit affects the chance that units and collaborate. In a similar way we may recursively define effects of order higher than three until order (Bartolucci et al., 2007), so that including the specification of these effects, the parametrization in (6) becomes invertible.
In the present approach, however, we prefer to focus only on first- and second-order effects as formulated in (3) and (4), leaving the structure of higher-order interactions unspecified. In fact, as we show in detail in Section 3, to make inference on these effects it is not necessary to specify the structure of the higher-order effects as we base inference on a pairwise likelihood function. This is an advantage of the marginal parametrization with respect to the log-linear parametrization; the latter does not allow us to directly express the marginal distribution of a subset of variables without specifying the full set of interactions. The way of obtaining each bivariate probability vector
on the basis of the parameters , , and , which are collected in the vector , is clarified in the Appendix.
2.2 Interpretation of the log-odds ratios
To clarify the interpretation of the log-odds ratio , it is useful to consider that it directly compares with the logit of the probability that there is a connection between units and , in the sense that both units are involved in the same event. In fact, from (1) we have
that is a commonly used effect in typical social network models; see, for instance, Hoff et al. (2002). There is an important difference between and : the former corresponds to the tendency of units and to be involved in the same event, but it does not disentangle this joint tendency from the marginal tendency of each single unit to be involved in the same event. For instance, could attain a large value only because both units have, separately, a high tendency to be involved in the event (both authors are very active in their publication strategy) even if there is not a particular “attraction” between them, namely with high values of or . On the other hand, is a proper measure of attraction because, as clearly shown by (5), it corresponds to the increase in the chance that one unit is present in the event given that also the other unit is present in the same event. This difference between parameters and is key in the proposed approach and is of particular relevance in the application of our interest, where each event may involve a variable number of actors and, possibly, also only one actor. Indeed, in our approach the general tendency of unit to be involved in an event is meaningfully measured by effects defined in (3). Note that effects of this type cannot be directly included in an ERGM.
The above arguments may be further clarified considering that, for given marginal distributions and , or, in other terms, for fixed and , the log-odds ratio is an increasing function of and thus of . Moreover, for a given value of this joint probability, is a decreasing function of and . In particular, considering that
we can easily realize that
Similarly, we have
with a corresponding expression for . An illustration of this behavior is provided in Figure 1, where for a pair of individuals we represent the value of with respect to (with and ), to (with and ), and to (with and ).
Another advantage of with respect to is that the former induces a variational independent parametrization (Bergsma and Rudas, 2002b). This means that the joint distribution of exists for any value in of the first-order effects and and of the second-order effect . More formally, the function relating with defined in (7) is one-to-one for in and in the four-dimensional simplex. This has advantages in terms of model interpretation and estimation. On the other hand, effect has a limited range of possible values with bounds depending on and , making the joint estimation and interpretation of and , more problematic.
2.3 Parametrization of marginal effects
Formulating a model for relational events requires to parametrize, in a parsimonious way, the effects and of main interest. In absence of individual covariates, we propose the following parametrization of the first-order effects:
where is a vector-valued function specific of time of each event . For instance, this function may contain the terms of a polynomial of suitable order of the day or year of event starting from the beginning of the study. This parametrization is similar to that of a latent trajectory model (Dwyer, 1983; Crowder and Hand, 1996; Menard, 2002), with the difference that, as we clarify in the sequel, each is considered as a vector of fixed individual parameters. In any case, it is possible to represent individual trajectories regarding the tendency over time to be present in an event.
Regarding the second-order effects, a natural extension of (8) would lead to a vector of specific parameters for each pair of units. However, to obtain a parsimonious model, we prefer to rely on an additive parametrization of type
where is defined as and, again, vectors represent the evolution of the tendency, of unit , to collaborate across time. We use two different functions, and , to allow for a different order of the involved polynomials of time. Note, however, that the additive structure in (9) implies that may be interpreted as the “general” tendency of individual to collaborate with other individuals in an event at time .
where and are suitable design matrices.
To clarify the proposed parametrization, consider a sample of individuals for a single event, for different values of the intercepts (from -3 to -1) and of (from -1 to 1). These values are reported in Table 1 together with certain average probabilities that help to understand the meaning of these parameters. The single tables for each pair of actors are reported in Table 2.
3 Pairwise likelihood inference
Before introducing the proposed methods of inference for the model described above, we clarify the data structure used in applications. We start from the data
where is the indicator function, for , , and . Moreover, in the applications of interest, it is possible to group events that, by assumption, have the same distribution. For instance, in the application based on the academic articles published by statisticians, it is plausible to assume that for all articles published in the same year the distribution of the binary vector is the same, even because it is not possible to have the precise dates of the publication and thus their precise time order. In other words, it is sensible to group events in homogenous periods , where denotes the time of event . Then, the relevant information is that contained in the frequency vectors
and consequently we denote by the corresponding probability vector having the same structure as in (7).
3.1 Fixed-effects estimation
where is the vector of all such parameters, that is the collection of individual parameter vectors , , used in (10). An alternative expression is
which is faster to compute as it relies on the frequency vectors defined in (11).
In order to maximize , it is important to obtain the score vector of each components . To this aim, it is convenient to introduce the log-linear effects which are collected in the vector , where
and is any of the events at time occasion . Also let denote the corresponding vector of marginal parameters. We have that
where is the sum of the elements of , that is, the number of events in time period , whereas and the derivative of with respect to are defined in Appendix.
The estimation algorithm is based on the following steps. First of all define an initial guess for the parameters , denoted by , . Then, for every unit , find the values of such that
so as to maximize
with respect to , with all other parameters kept fixed. Iterate this process until convergence in , and denote the final parameter estimates by , , which are collected in the vector . In practice, the algorithm steps may be implemented by using a readily available numerical solver.
With large samples, it is typically of interest to find clusters of units presenting a similar behavior. In our approach this amounts to assume that there are groups of individuals having a similar behavior in terms of tendency to be involved in an event and groups of individuals having a similar tendency to collaborate in the network. For each group we have specific parameter vectors denoted by and , with and , all collected in the parameter vector .
For unit , let denote the cluster to which the unit is assigned with respect to the first type of tendency and the cluster assigned with respect to the second type of tendency. The corresponding classification pairwise log-likelihood has the same expression as defined in (12), with , , and then . This function is denoted by , where is the vector with elements and is that with elements , respectively, with .
To cluster units in homogeneous groups, we maximize by an iterative algorithm that is initialized from the output of a -means clustering of the individual estimates and . Then, it alternates the following three steps until convergence:
for try to change the cluster of unit by finding the cluster that maximizes the individual component of the classification pairwise log-likelihood, which is defined as in (13) accounting for the cluster structure, with all other parameters kept fixed;
for try to change the cluster of unit by the same procedure as above;
update the parameter estimates of , , and , , by maximizing with respect to with and kept fixed.
We select and as the smallest number of clusters such that the initial clustering of the estimates and , performed by the -means algorithm, leads to a between sum of squares equal to at least 80% of the total sum of squares. Then, at convergence of the three steps illustrated above, we check that the number of clusters is adequate, comparing the maximum value of with that of , as we show in connection with the application in the next section.
In order to illustrate the approach based on individual-specific effects, see assumptions (8) and (9), we propose an application based on the data recently made available by Ji and Jin (2017). The data refer to the publication history of all authors with at least one paper published in four top statistical journals (Annals of Statistics, Biometrika, Journal of the American Statistical Association, Journal of the Royal Statistical Society - series B) between 2003 and the first half of 2012. Overall, 3,607 authors are involved who coauthored 3,248 articles. In Table 3
we report some descriptive statistics, whereas in Figure2 we represent the distribution of the number of articles and the number of coauthors for each individual in the dataset.
From Table 3 we observe that the number of published papers per year does not vary considerably, even if there is an increase from 2003 to 2009 and a decrease after 2009 (year 2012 counts only partially). Moreover, the average number of authors moderately increases during the time span and it is important to note that the number of single-author articles is relevant in each year. Indeed, these articles represent the 16.7% of the total articles considered in the dataset and this justifies the use of the proposed approach for the analysis. Regarding Figure 2, we note the concentration of the number of articles, with the majority of authors (2,335) who published only one article in one of the four top journals considered in the reference period and 520 who published only two articles. On the other hand, the three most productive researchers published 33, 40, and 82 articles, with a total of 155 overall (ignoring possible overlapping). Even the distribution of the number of coauthors shows a very high concentration, although the situation is somehow different as the third modality (i.e., 2 coauthors) has the highest frequency, equal to 970. The authors with zero and one coauthors are 154 and 842, respectively, whereas the three highest modalities are 81, 94 and 112.
The application is based on two phases, that is, fixed-effects estimation and clustering, which are illustrated in Sections 3.1 and 3.2, respectively. Regarding the first phase, with reference to (8) and (9), we assume second order polynomials for the effect of time. In this way we estimate 3,607 parameter vectors and of length 3. On the basis of these estimates it is possible to obtain trajectories both in terms of tendency to publish an article in a certain period and in terms of tendency to collaborate with other authors. We recall that, for the former, the effect that is represented is the logit defined in (3) and for the latter it is the log-odds ratio (4). In order to illustrate these results, we consider the five authors with the largest number of published articles in the period that we identify with the letters from A to E; the patterns of publication of these authors is reported in Table 4.
For these top five authors, we represent the estimated profiles in Figure 3 where we can clearly identify author E as the most productive one with a profile that follows a reverse U-shape, having its pick around years 2006 and 2007, coherently with the data in Table 4. On the other hand, it may seem surprising that this author shows the lowest profile in terms of tendency to collaborate with other authors. However, coherently the data in the table, in terms of ratio between number of coauthors and number of published papers, author E tends to be lower than everybody else. The conclusion is that the large number of coauthors of author E can be mostly ascribed to his tendency to publish; see Section 2.2 for general comments on the interpretation of these results. In a similar way we can interpret the profiles of the other authors. For instance, authors C and D, who published the same number of papers (namely 40), have very similar profiles in terms of tendency to publish, but according to the proposed model, D has a higher tendency to collaborate, with a difference that also increases in time, and in particular the curve for D always dominates that for C; this is again coherent with the data in Table 4. Finally, authors A and B have profiles which are in agreement with a smaller number of published papers that, at the same time, tends to increase from 2003 to 2012.
To improve the interpretability of these profiles, instead of using the logits defined in (3), we can also express the tendency to publish in terms of expected number of publications per year. These expected values are obtained by multiplying the probability to be involved in a publication by the yearly number of publications available in Table 3. The resulting profiles are reported in Figure 4 and confirm the previous conclusions.
When we examine the overall sample of 3,607 authors, the analysis may be effectively carried out by building clusters of authors. Following the approach described in Section 3.2, we find evidence of different profiles in terms of tendency to publish and profiles in terms of tendency to collaborate. The model with clustered profiles attains a maximum profile log-likelihood (normalized dividing by the number of ordered pairs of units) equal to -34.916 that is close that of the fixed-effects method, which is equal to -33.693. On the other hand, the maximum pairwise log-likelihood with only one cluster is equal to -75.401. This means that using a structure of clusters implies an improvement of the pairwise log-likelihood equal to 97.1% with respect of using only one cluster, despite the huge reduction in the number of parameters with respect to the fixed-effects model: rather than using individual-specific parameter vectors and , we use a very limited number of parameter vectors denoted by and . The corresponding profiles are represented in Figure 5.
It is interesting to note that the 6 cluster profiles in terms of tendency to publish cover different possibilities. In particular, profiles for the first 3 clusters have a reversed U-shape with picks located at different years. On the other hand, profiles for clusters 4 and 5 have a U-shape but are rather different. Finally, authors in cluster 6 tend to have a rather constant over time tendency to publish, which is higher than for the other clusters. In any case, the values of the logit even for this class corresponds to low probability levels, with an expected number of publications which is smaller than 1 for all years. In a similar way we can interpret the 5 clusters in terms of tendency to collaborate. For instance, individuals in the first cluster have a general tendency to collaborate lower than the other authors, which is rather constant in time.
It is important to stress that, in principle, any author may belong to any cluster of the first type (in terms of tendency to publish) and of the second type (in terms of tendency to collaborate). In order to better understand this aspect, we consider the cross classification of authors according to both criteria. This cross classification is reported in Table 5 that also shows the size of each cluster in terms of units assigned to it.
|tend. to||tend. to collaborate|