A network is a data structure consisting of nodes (vertices) connected by links (edges). A network with nodes can be represented by an adjacency matrix , where if there is an edge between nodes and and otherwise. A network can be weighted, where measures the link strength between node and . Network analysis has drawn increasing attention in a number of fields such as social sciences (Wasserman and Faust, 1994; Liben-Nowell and Kleinberg, 2007), physics (Barabási and Albert, 1999; Newman and Girvan, 2004), computer science (Getoor and Diehl, 2005), biology (Stark et al., 2006) and statistics (Bickel and Chen, 2009; Hunter and Handcock, 2006).
Traditionally, statistical network analysis deals with inferences concerning parameters of an observed network, i.e., an observed adjacency matrix (see Newman (2010); Goldenberg et al. (2010); Zhao (2017) for reviews of models and techniques for analyzing observed networks).
In this article, we focus on the case that the network is unobserved and to be estimated. What we do observe is a collection of subsets of nodes. Each subset is called agroup by Zhao and Weko (2017) and a data set consisting of such groups is referred to as grouped data. We continue to use the term grouped data in this article.
To better explain the structure of grouped data, we introduce some notations. For a set of individuals, , we observe subsets at times , called groups. Each observed subset can be represented as an
length row vector, where
Let be a matrix with being its rows. For simplicity, we will slightly abuse the notation: we will also refer to the indicator vector as a group from now on.
In this article, we analyze the grouped data from the so-called social network perspective (Moreno, 1934). The grouping behavior of the individuals is presumed to be governed by a latent social network . The objective of this article is to infer the latent network from the groups being observed. In other words, we aim to estimate the link strength between individuals using the information about their presence in the groups.
Throughout this paper, we consider only the case that there exists one and only one group at a time .
The groups at different times can overlap. In fact, it is only plausible to make meaningful inferences of from if groups overlap. If all groups are disjoint, the best inference is to use a clique, i.e., a fully connected subgraph to estimate the relationships within each group.
Wasserman and Faust (1994) discuss grouped data (which they called affiliation networks) as well as some empirical methods and graphical representations for this type of data in Chapter 8 of Social Network Analysis: Methods and Applications. The authors give an illustrative example of six children and three birthday parties (page 299), which is shown in Table 1. By the use of the notation above, since Drew attended Party 2, but since Drew did not attend Party 3.
Numerous researchers in social sciences have been interested in grouped data, in particular how to infer social structures from such data. Wasserman and Faust (1994) provide a list of such data sets (pages 295-296). For example, Galaskiewicz (1985) collected CEO membership data that consisted of their participation in clubs, cultural boards and corporate boards of directors (see Wasserman and Faust (1994) page 755 for the data). As another example (not in Wasserman and Faust (1994)), Roberts and Everton (2011) collected a data set of 79 terrorists’ presence in meetings, trainings and other events.
Grouped data are also popular in the study of social behaviors of animals. Social network analysis (SNA) has become an important tool in this area (Whitehead, 2008; Croft et al., 2011), but direct linkages between animals are often difficult or expensive to record (Croft et al., 2011) and certain methods used for collecting human network data, such as surveys, are clearly impossible. On the contrary, grouped data such as groups of dolphins (Bejder et al., 1998) or flocks of birds (Farine et al., 2012) are relatively easier to identify and record.
Zhao and Weko (2017) use group data to study novels by treating the characters of a novel appearing in the same paragraph as a group and using the inferred network structure to interpret the relationships between characters.
Despite the popularity of group data, existing methods for network inference from grouped data are mainly ad-hoc approaches from the social sciences literature. A simple technique is to count the number of times that a pair of nodes appears in the same group. This measure has been called different names by different authors, e.g., the co-citation matrix in Section 6.4 of Newman (2010) or the sociomatrix in Section 8.4 of Wasserman and Faust (1994). Zhao and Weko (2017) refer to this measure as the co-occurrence matrix. The half weight index (Cairns and Schwager, 1987)
is an alternative approach that uses the conditional frequencies of co-occurrences as estimates. A common difficulty of such methods is that they provide no statistical model to connect these descriptive statistics with the latent network.
Zhao and Weko (2017) recently proposed a model-based approach for grouped observations. In the so-called hub model, s are modeled as independently and identically distributed random vectors and there is a central node called hub or group leader in each group, who gathers other members into the group. For example, the hub is the child who hosted the party in Table 1.
A crucial assumption made in Zhao and Weko (2017) is that the groups are assumed to be independently generated by the hub model. In some cases, this assumption is reasonable if each group forms spontaneously. The assumption can also be approximately satisfied if researchers collect grouped data with sufficiently long time intervals between observations (see Bejder et al. (1998) for discussion).
The independence assumption however may not be valid in other situations. In most practical situations, the grouped observations are temporal-dependent by default. For example, in a study of animal behavior, researchers may observe the behavior of animals on an hourly or daily basis. In Section 5, we analyze such a data set consisting of groups of wild chimpanzees studied by the Kibale Chimpanzee Project. It is inappropriate to assume that every group is independent from the previous group. A more plausible point of view is that the group at a particular time is a transformation of the previous group. That is, some new members may join the group and some may leave, but the group maintains a certain level of stability.
We generalize the idea of the hub model in order to accommodate temporal dependence between groups. We call the new model the temporal-dependent hub model, or the temporal-dependent model in short. This new model allows for dependency between group leaders as well as between other group members. We explain both dependency assumptions in the next two paragraphs.
As in the classical hub model, we assume there is one leader for each group. Leaders however are not sampled independently in the temporal-dependent model, but follow a Markov chain. That is, the probability of a certain node being the current leader depends on the leader in the previous group.
For other group members, we consider the following two cases to make the model flexible enough. If the current leader is inside the previous group, then we treat this group as a transformation of the previous one. If the new leader is from outside the previous group (e.g., some event occurs and completely breaks the previous group) then we treat this group as the start of a new segment. In this case, the leader will select the group members as in the classical hub model, i.e., independently of whether or not they were members of the previous group.
As shown in Section 3
, the temporal-dependent hub model can be viewed as a generalization of the hidden Markov model (HMM) when the group leaders are latent. An efficient algorithm is thus developed for model fitting. Furthermore, the temporal-dependent hub model provides estimates of the elements of the adjacency matrix with lower mean squared errors according to numerical studies in Section4.
Finally, we discuss some related works. First, the temporal-dependent hub model is fundamentally different from many existing models for dynamic networks, such as the preferential attachment model (Barabási and Albert, 1999), discrete/continuous time Markov models (Snijders, 2001; Hanneke and Xing, 2007), etc. In these works, the observed data are snapshots of the network at different time points. In this article, the unknown parameters are a single latent network and the observations are groups with temporal-dependent structures.
Second, there are recent studies on estimating latent networks or related latent structures in dynamic settings, but from data structures that are different from groups. Guo et al. (2015) propose a Bayesian model to infer latent relationships between people from a special type of data – the evolution of people’s language over time. Robinson and Priebe (2013) propose a latent process model for dynamic relational network data. Such a data set consists of binary interactions at different times. Blundell et al. (2012) propose a nonparametric Bayesian approach for estimating latent communities from a similar data type. The grouped data we consider in this article are more complicated than binary interactions in the sense that, unlike a linked pair, the links within a group consisting of more than two members are unknown.
Third, there are other interesting works on modeling latent social networks from survey data and such data only provide partial information of a latent network. These survey data also have different structures from grouped data. McCormick and Zheng (2015) propose a latent surface model for aggregated relational data collected by asking respondents the number of connections they have with members of a certain subpopulation. In this work, the network structure for the population is latent. Admiraal and Handcock (2016) fit exponential-family random graph models (ERGMs) to latent heterosexual partnership networks, with degree distributions and mixing totals being sufficient statistics in the exponential family. Those statistics for the underlying population are inferred from cross-sectional survey data. Krivitsky and Morris (2017) fit ERGMs to egocentrically sampled data, which provide information about respondents and anonymized information on their network neighbors.
2.1 The classical hub model
We briefly state the generating mechanism of the classical hub model (Zhao and Weko, 2017). The hub model assumes one leader for each group. The leader of is denoted by .
Under the hub model, each group is independently generated by the following two steps.
The group leader is sampled from a multinomial distribution with parameter , i.e., , with .
The group leader, , will choose to include in the group with probability , i.e., .
2.2 Generating mechanism of the temporal-dependent hub model
The hub model assumes that all the groups are generated independently across time. In practice, it is more natural to model the groups as temporal-dependent observations.
We first explain the idea of the generating mechanism of temporal-dependent groups and then give the formal definition. We generalize the idea of the hub model into the temporal-dependent setting. Specifically, we assume there is only one leader at each time who brought the group together, but the group at time depends on the previous group, which is different from the classical hub model.
At time , the group is generated from the classical hub model. For , the group leader can remain the same as the previous leader or change to a new one. We assume that the leader will remain as with a higher probability than the probability of changing to any other node.
If the new leader is outside the previous group, then the current group is considered the start of a new segment and is generated by the classical hub model. It is worth noting that technically, the generation of the new group however still depends on the previous group. This will become clearer after we introduce the likelihood function. For the case that the new leader is inside the previous group – that is, if the leader remains unchanged, or the leader changes but is still a member of the previous group – we propose the following In-and-Out procedure: for any node being in the previous group, it will remain in with a probability higher than – the probability in the classical hub model. On the contrary, for any node not being in the previous group, it will enter with a probability lower than . Intuitively, this In-and-Out procedure assumes that when a group forms, it will maintain a certain level of stability.
We now give the formal definition of the generating mechanism as follows:
Step 1: (Classical hub model). When , is generated by the following two substeps.
The leader is sampled from a multinomial distribution with parameter , i.e.,
where for .
The leader will choose to include in the group with probability , i.e., , where and
Here, for and for . We allow some to be so that the corresponding can be 1 or 0.
Step 2: (Leader change). For ,
Step 3: (In-and-Out procedure). For , given being the leader, is generated by the following mechanism:
If is not within , then it will include each in the group with probability ; otherwise, see below:
If , will include in the group with probability
If , will include in the group with probability
For clarity of notation, we now give the vector/matrix form. Define , and . Define , , and . Furthermore, we assume , , and to be symmetric in order to avoid any issue of identifiability (see the discussion in Zhao and Weko (2017)).
In the definition above, and are simply a reparameterization of and in exponential form. This is to make optimization more convenient, since log-likelihood is convex under this parametrization.
The parameters , and characterize the dependency between the groups. is the adjustment factor, which controls the probability that a leader in the previous group remains as a leader. is the adjustment factor for nodes being inside the previous group. And is the adjustment factor for nodes being from outside the previous group. We do not enforce , and in the model fitting. Instead, we test these assumptions for the data example in Section 5.
The parameters , and are identifiable. The key observation is that the identifiability of can simply be obtained by since the first group only depends on and . This is essentially the identifiability of the classical hub model. The proof is given by Theorem 1 in Zhao and Weko (2017), under the condition of being symmetrical. With the “baseline” being separately identified, the two adjustment factors and are accordingly identifiable.
The parameters are non-identifiable under this parametrization, since gives the same likelihood. We will discuss the solution to this problem in Section 3 after introducing the algorithm.
For notational convenience in the likelihood, we indicate the leader in group by an length vector, , where
Only one element of is allowed to be 1. is simply another representation of . Let be a matrix, with being its rows.
Clearly, is a Markov chain according to the generating mechanism. Let be the transition probability and . We summarize all introduced notations in Table 2.
|Parameter||Probability of being the leader of|
|Adjustment factor for remaining leaders|
|Probability of being inside the group|
|when is the leader in a newly formed group|
|Adjustment factor for nodes being inside the previous group|
|Adjusted probability of being inside the group|
|when inside the previous group|
|Adjustment factor for nodes being outside the previous group|
|Adjusted probability of being inside the group|
|when outside the previous group|
|Data||Group at time|
|Leader at time|
|Indicator of the leader at time , with only one element being 1|
|Index||Size of the network|
|Number of groups (sample size)|
We now give the joint log-likelihood of and for the model defined in the previous subsection:
Note that and are essentially the parameters of this model and , , , and are their functions. Despite its length, Equation (2.1) has a clear structure. The 1st line gives the log-likelihood of . The 2nd line gives the log-likelihood of given . The 3rd line gives the log-likelihood of given that the current leader is outside the previous group . The 4th and 5th lines give the log-likelihood of given that is inside , based on the In-and-Out procedure.
Equivalent to (2.1), we can write the likelihood as a product of conditional probabilities:
This factorization can be represented by a Bayesian network (Figure1), where a node represents a variable and a directed arc is drawn from node to node if is conditioned on in the factorization. (Refer to Jordan et al. (1999) for a comprehensive introduction to Bayesian networks). This Bayesian network should not be confused with the latent network – the former is a representation of the dependency structure between variables while the latter reflects the relationships between the group members.
Furthermore, the group leaders are assumed to be latent (as are ) since in many applications only the groups themselves are observable.
3 Model fitting
In this section, we propose an algorithm to find the maximum likelihood estimators (MLEs) for and . With being the latent variables, an expectation-maximization (EM) algorithm will be used for this problem. The EM algorithm maximizes the marginal likelihood of the observed data, which in our case is , by iteratively applying an E-step and an M-step.
Let and be the estimates in the current iteration. In the E-step, we calculate the conditional expectation of the complete log-likelihood given under the current estimate. That is,
In the M-step, we maximize this conditional expectation with respect to the unknown parameters. That is,
It has been proved by Wu (1983) that the EM algorithm converges to a local maximizer of the marginal likelihood. (Refer to McLachlan and Krishnan (2008) for a comprehensive introduction to this algorithm). We now give details of the two steps in our context.
Since the complete log-likelihood is a linear function of and , the computation of its conditional expectation is equivalent to calculating and . From now on, all conditional probabilities are defined under the current estimates.
A brute-force calculation of these probabilities, such as
is infeasible since the numerator involves a sum of terms. This is because are not independent according to our model. An efficient algorithm is needed for all practical purposes.
The temporal-dependent hub model is similar to the hidden Markov model (HMM) (Figure 1). A polynomial-time algorithm for this model, called the forward-backward algorithm, was developed for computing the conditional probabilities. See Smyth et al. (1997); Ghahramani (2001) for tutorials on HMMs and this algorithm.
In the HMM, the observed variable at time only depends on the corresponding hidden state. But in our model, depends on both the current leader and the previous group . We develop a new forward-backward algorithm for our model, which has more steps than the original algorithm but is also polynomial-time. We describe the algorithm here (see the Appendix for detailed derivation and justification).
Define and as matrices. These matrices are computed by the following recursive procedures.
The matrices , and should not be confused with the matrices , and introduced in Section 2. The symbols are case-sensitive throughout the paper.
With these quantities,
The complexity of this algorithm is .
Note that the first row of is undefined but also unused. Also note that the elements of and will quickly vanish as the recursions progress. Therefore, we renormalize each row to sum to one at each step. It can easily be verified that this normalization does not affect the conditional probabilities. Finally, we emphasize that this algorithm gives the exact values of the conditional probabilities in a fixed number of steps – i.e., it is not an approximate or iterative method.
The M-step is somewhat routine compared to the E-step. First, it is clear that and can be handled separately.
We apply the coordinate ascent method (see Boyd and Vandenberghe (2004) for a comprehensive introduction) to iteratively update and , as well as and . Since the complete log-likelihood is concave and so is , coordinate ascent can guarantee a global maximizer.
At each step, we optimize the log-likelihood over parameter one by one with the other parameters being fixed. The procedure is repeated until convergence. At each step, we use the standard Newton-Raphson method to solve each individual optimization problem. Specifically, for a parameter (here can represent , , , or ), the estimate at -th iteration is updated by the following formula given its estimate at -th iteration:
The calculation of these derivatives is straightforward but tedious, so we provide the details in the Appendix.
As shown in Section 2.2, the model is not identifiable with respect to . A standard solution to this problem is to set some . But it does not work for our case. This is because for small data sets, some estimated by the EM algorithm may be zero, implying that never became the leader. Furthermore, these zero cannot be predetermined since the leaders are unobserved. We observe that without constraint on , the algorithm converges to different with different initial values, but the corresponding will be the same. Therefore, identifiability is not an issue for model fitting.
3.3 Initial value
As with many optimization algorithms, the EM algorithm is not guaranteed to find the global maximizer. Ideally, one should use multiple random initial values and find the best solution by comparing the marginal likelihoods under the corresponding estimates.
In principle, can be computed by , as shown in Section 3.1. But the marginal likelihood vanishes quickly, even with a moderate . Note that we cannot renormalize and for the purpose of computing .
This measure estimates the conditional probability that two nodes co-occur given that one of them is observed, which is a reasonable initial guess of the strength of links. Furthermore, we use zero for the initial values of and , and for the initial value of .
4 Simulation studies
In all simulation studies, we fix the size of the network to be and set and . We generate as independently and identically distributed variables with and . The parameters are generated independently with . We generate in this way to control the average link density of the network ( 0.12), which is more realistic than a symmetric setting, i.e., . For clarification, we will not use the prior information on and in our estimating procedure. That is, we still treat and
as unknown fixed parameters in the algorithm. We generate them as random variables for the whole purpose of adding more variations to the parameter setup in our study.
We consider three levels of , which correspond to a leader from the previous group remaining unchanged in the current group with probabilities on average. For each we try five different sample sizes, and 3000.
|1000||0.124 (7)||0.104 (6)||0.140 (8)||0.109 (7)||0.163 (12)||0.122 (10)|
|1500||0.101 (7)||0.084 (6)||0.114 (8)||0.087 (6)||0.137 (9)||0.096 (9)|
|2000||0.087 (5)||0.070 (5)||0.099 (6)||0.073 (5)||0.120 (8)||0.081 (8)|
|2500||0.077 (5)||0.062 (4)||0.090 (5)||0.065 (5)||0.109 (7)||0.071 (7)|
|3000||0.070 (4)||0.056 (4)||0.083 (4)||0.059 (4)||0.101 (6)||0.064 (6)|
from the classical hub model (HM) and the temporal-dependent hub model (TDHM). Standard deviationsin parentheses.
Figure 2 shows the average root of mean squared errors (RMSEs) for the estimated over 100 replicates. For each simulation, we compare two methods, the classical hub model (HM) and the temporal-dependent hub model (TDHM). We assume the leaders are unknown under both models. Table 3 provides the same information (with standard deviations) in numerical form.
Second, the RMSEs for all the parameters increase as increases. This phenomenon can be interpreted as follows: with a larger value of , the correlation between adjacent groups becomes stronger and hence the effective sample size becomes smaller. The ratio of the sample size to the number of parameters decreases with , which makes inferences more difficult.
Third, the temporal-dependent hub model always outperforms the hub model. Moreover, the discrepancy between the temporal-dependent hub model estimates and the corresponding hub model estimates becomes larger as increase. This is because the behavior of the temporal-dependent model deviates more from the classical hub model as increases.
The standard deviations and means show a similar trend. That is, the standard deviations decrease as increases and decreases. The standard deviations for the temporal-dependent hub model estimates are comparable to or slightly smaller than those of the hub model estimates.
5 A data example of group dynamics in chimpanzees
Behavioral ecologists become increasingly interested in using social network analysis to understand social organization and animal behavior (Bejder et al., 1998; Whitehead, 2008; Croft et al., 2011; Farine et al., 2012). The social relationships are usually inferred by using certain association metrics (e.g., the half weight index) on grouped data. As indicated in the Introduction however, it is unclear how the inferred network relates to the observed groups without specifying a model.
In this section, we study a data set of groups formed by chimpanzees by the temporal-dependent hub model. This data set is compiled from the results of the Kibale Chimpanzee Project, which is a long-term field study of the behavior, ecology and physiology of wild chimpanzees in the Kanyawara region of Kibale National Park, southwestern Uganda (https://kibalechimpanzees.wordpress.com/).
Our analysis focuses on grouping behavior. We analyze the grouped data collected from January 1, 2009 to June 30, 2009 (Kibale Chimpanzee Project, 2011)
. The group identification was taken at 1 p.m. daily during this time period. If there is no group observed at 1 p.m. for a given day, it is not included in the data. Only one group is observed at 1 p.m. in 75.29% of the remaining days over this period of six months. In the other days, multiple groups (usually two) are observed at 1 p.m. For these cases, we keep the group that has the most overlap with the previous group in our analysis. We use the Jaccard index to measure the overlap between two groupsand ,
where the numerator is the size of the intersection of two groups and the denominator is the size of their union. One may refer to Liben-Nowell and Kleinberg (2007) for an introduction to this measure.
Moreover, five chimpanzees never appear in any group and thus are removed. After the preprocessing, the data set contains 170 groups with 40 chimpanzees.
Figure 3 illustrates the data set in grayscale with rows representing the groups over time and columns representing the chimpanzees. Black indicates at location while white indicates . The pattern in Figure 3 clearly demonstrates the existence of dependency between groups.
Figure 3 also shows the inferred grouped leaders indicated in red with the inferred segments separated by blue lines. By the inferred grouped leader for
, we mean that the chimpanzee with the highest posterior probability will be the leader given. As shown in Figure 3, the leaders retain a certain level of stability, which is consistent with the estimates of (). Also, recall that by our definition, a new segment starts if the current leader is not within the previous group. From Figure 3, the inferred segments are coincident with the visualization of the data set.
The estimated values of the adjustment factors, and . The magnitude of is larger than that of , which suggests individuals have a stronger tendency to join a group than leave a group. In other words, the groups may start with small size and grow larger over time. This phenomenon is shown (Figure 3).
Figure 4 shows the result of estimated adjacency matrices by the classical hub model and the temporal-dependent hub model. As in the previous figure, the darker color indicates a higher value of . The red blocks indicate clusters of chimpanzees in a biological sense. The first cluster consists of 12 adult males and each of the other nine clusters consists of an adult female and its children. From the estimates by both the classical hub model and the temporal-dependent hub model, there are strong connections within these biological clusters. Both estimates suggest that in this data set of chimpanzees, adult males usually do activities together but females usually stay with their children.
The two estimated adjacency matrices are different, however. Generally speaking, without properly considering the temporal-dependence between groups, the estimates of the relationships between individuals by the classical hub model can be biased. That is, an individual may choose to stay within or out of a group not solely based on its relationship with the group leader but also because of the inertia. The overall graph density of the estimated network by the classical hub model is larger than the corresponding value of the temporal-dependent hub model. This is consistent with the fact that the magnitude of is larger than the magnitude of . Since the classical hub model does not incorporate the adjustment factors, bias is introduced to certain so the model can match the overall frequency of occurrences for the individuals.
The significance of , and is tested by the parametric bootstrap method (Efron and Tibshirani, 1994). Specifically, we generate 5000 independent data sets from the fitted temporal-dependent hub model to the original data and compute the MLEs for each simulated data set. The parametric bootstrap was applied to HMMs and showed a good performance (Visser et al., 2000). Figure 5 shows the histograms of the MLEs for , and
. The 95% bootstrap confidence intervals for, and are (1.2177, 1.9774), (2.0710, 2.8944) and (-0.5208, 0.0410), which shows that the effects of and are significant while is not at the 0.05 significance level. This further supports the observation in the previous paragraph – chimpanzees have a stronger tendency to join a group than to leave a group in this data set.
6 Summary and discussion
In this article, we generalize the idea of the hub model and propose a novel model for temporal-dependent grouped data. This new model allows for dependency between groups. Specifically, the group leaders follow a Markov chain and a group is either a transformation of the previous group or a new start, depending on whether the current leader is within the previous group. An EM algorithm is applied to this model with a polynomial-time algorithm being developed for the E-step.
The setup of our model is different from some work on estimating time-varying networks by graphical models, e.g., Kolar et al. (2010) for discrete data and Zhou et al. (2010) for continuous data. These papers assume that the observations are independent and the latent network changes smoothly or is piecewise constant. In this paper, we instead focus on the dependence between groups. Ideally, both aspects – dependence between groups and changes in networks – need to be considered in modeling. This should be plausible when the sample size, i.e., the number of observed groups, is large. When the sample size is moderate (as in the chimpanzee data set), the length of the time interval between observation plays a key role in determining which aspect is more important. If the time interval is short, then dependence between groups is significant but the changes in the latent network are likely to be minor since the overall time window is not long. On the contrary, the dependence between groups becomes weak when the time interval is long.
For future work, we plan to study the time-varying effect on the latent network for the grouped data. When changepoints exist, a single network cannot accurately represent the link strengths in different time windows. A change-point analysis for temporal-dependent grouped data is an intriguing research topic. Alternative approaches may be based on penalizing the difference between networks at adjacent time points, although careful investigation is required to determine how tractable these methods are for temporal-dependent groups.
In addition to time-varying networks, the temporal-dependent hub model can also be extended in the following directions: first, a group may contain zero or multiple hubs. Second, multiple groups may exist at the same time (with some of these groups being unobserved). These generalizations however will significantly increase model complexity. Therefore, the total number of possible leaders needs to be limited. A method by the author and a collaborator (Weko and Zhao, 2017) was proposed to reduce this upper bound. More test-based and penalization methods are under development.
Furthermore, we also plan to investigate the theoretical properties of the proposed model. When the size of the network is fixed and the number of observed groups goes to infinity, the theoretical properties of the MLE may be studied via a standard theory of the Markov chain. The case that the size of the network also goes to infinity is more intriguing but more complicated since the number of parameters diverges.
This research was supported by NSF Grant DMS 1513004. We thank Dr. Richard Wrangham for sharing the research results of the Kibale Chimpanzee Project. We thank Dr. Charles Weko for compiling the results from the chimpanzee project and preparing the data set.
Appendix A Forward-backward algorithm in the E-steps
We derive the forward-backward algorithm for the temporal-dependent hub model introduced in Section 3.1. Before proceeding, we state two propositions of Bayesian networks. These results (or the equivalent forms) can be found in a standard textbook or tutorial on Bayesian networks, for example, Jordan et al. (1999). Here we follow Ghahramani (2001).
Each node is conditionally independent from its non-descendents given its parents. Here node is a parent of another node if there is a directed arc from to and if so, is a child of . The descendents of a node are its children, children’s children, etc.
Two disjoint sets of nodes and are conditionally independent given another set , if on every undirected path between a node in and a node in , there is a node in that is not a child of both the previous and following nodes in the path.
Define as a collection of groups from time to time .
Let . Then,
The last equation holds by Proposition A.1.
Similarly, let . Then,
In the last equation, holds by Proposition A.2. This is because a path from to must pass or . If it only passes one of these two variables, then we can take that variable as in Proposition A.2. If it passes both, then take as . The rest of the last equation holds by A.1.
The computation of and is essentially the same as in the classical forward-backward algorithm for the HMM with minor modifications. Unlike the HMM, the dependence between the current and the previous groups requires another quantity .
The last equation can be justified by a similar argument as before.
Appendix B Derivatives of
We give the first and second derivatives of with respect to and , which are used in the coordinate ascent method introduced in Section 3.2.