1 Introduction
Leadership is a process of individuals (leaders) who influence a group to achieve collective goals [13, 9]. Leadership plays a key role in solving collectiveaction problems (e.g. social conflicts, migration, hunting, territorial defense) across social species [9], organizing collective movement [6], as well as collaboration in group’s decision making [8, 9]. In the context of coordination, which is defined as an emergence of collective actions to achieve the collective goals [19], leadership mainly contributes by fostering collective behaviors in social species ranging from humans [8, 13, 9] to fish [18].
In nature, leadership can be viewed as a process of initiation of coordinated activity. For example, leadership is a process by which leaders initiate the group’s coordinated movement toward a destination [18, 23, 26]. In this process, leaders guide their group’s members to follow in the right direction. Understanding how leaders emerge and influence collective behaviors enables scientists to gain insight into synchronization and coordination processes in nature. In this paper, we use the words ‘leader’ and ‘initiator’ interchangeably.
While many studies on leadership in coordinated activity exist in behavioral research, there are a few computational approaches addressing the leadership of coordination. In social network analysis, Influence Maximization (IM) [16, 11, 12] is one of the classic problems that focuses on inferring a subset of individuals that maximizes information spreading. However, IM focuses solely on finding potential initiators who initiate the coordination of information spreading and, moreover, does not address the question of when coordination happens. The method for inferring leaders from online communities actions [10] can be used to identify the group being coordinated but it, still, does not provide the information on when coordinated activities happen. In movement coordination, [4, 17, 21, 5, 15] propose methods specific to movement activity for finding leaders during group’s movement intervals but none of them can be used to identify the time of the process of coordination. There also exist many works regarding collective behavior and implicit leaders [28, 6, 29]. In this model, leaders can influence their group implicitly and leaders’ identity might be unknown to the group. Still, none of the works in this category can be used to infer the time of the periods of coordination.
Since leadership is a collective process [13], considering only dyadic interactions is not enough to infer a leadership instance. Therefore, the works in [3, 17, 15] proposed leadership frameworks that are based on a network representation of time series.
In the context of coordination leadership, the method of leadership inference in [3] provides a solution for identifying coordination events, the initiators of these events, as well as proposes an approach for the classification of the types of leadership models acting on a group. However, the framework in [3] cannot be used to infer multiple coordinated activities which can occur simultaneously because the notion of multiple factions is not employed by the framework. We aim to close these gaps in the study of coordination leadership.
1.1 Our Contributions.
First, we introduce the novel computational problem of leadership identification in multiple coordinated activities, namely Faction Initiator Inference Problem. We formalize the problem and analyze its theoretical properties and implications. Second, we propose a framework for Faction Initiator Inference Problem by combining several existing methods in a principled and novel manner. Our framework is capable of:

Detecting intervals of multiple coordination: inferring intervals when different coordinated activity in one or more groups may occur simultaneously;

Identifying leaders: identifying the initiators of these coordinated activities, the individual who initiates each coordination and the group that follows;

Discovering the events of merging and splitting of coordination: identifying the time when a coordinated group is separated into smaller subgroups or merged with another coordinated group.
Faction Initiator Inference Problem: To reach collective goals, group’s members must coordinate with each other. Multiple factions within a big group may exist solving their subtasks in helping the entire group achieve the collective goals. Given time series of individual activities, our goal is to identify periods of coordination and the subsequent coordinated activity, find factions of coordination if more than one exist, as well as identify leaders of each faction
We demonstrate the ability of the framework to infer leadership in multiple coordinated groups on both simulated and biological datasets. Since we propose the new problem and framework and no other approaches exist, we compare our framework against a nontrivial baseline, which is the modification of the closest existing approach in leadership inference. Our approach is flexibly generalizable to any multiple coordinated activities from any time series data.
1.2 Influence Maximization vs. Faction Initiator Inference Problem
Influence Maximization can be viewed a special case of the Faction Initiator Inference Problem, namely a single event of coordinating the state of information in a social network, using a specific coordination (spreading) mechanism.

Coordination Mechanism: Majority of Influence Maximization work uses Independent Cascade and Linear Threshold models as main coordination mechanisms. Yet, there are other models, such as Hierarchy, Dictatorship or other nonnetwork based models that can be represented as coordination mechanisms. The new problem we formalize in this paper, Faction Initiator Inference Problem, generalizes to all mechanisms for coordinating group activities and we demonstrate so in this paper by using datasets generated by several models of coordination mechanisms.

Coordination Event: Influence Maximization focuses mainly on an information spreading event happening in a social network. The information state for each node are the time series being coordinated. However, this is one particular type of a coordination event and other, more general and nonnetwork, coordination activities are possible. For example, a coordinated movement activity of animals is a coordination event that has animals coordinating their trajectories, not necessarily through a wavelike spread of information in a network, to reach a group destination. Our proposed framework can handle all types of coordination events, including but not limited to network information spreading.

The dynamics of coordination: In influence Maximization, majority of papers focus on inferring a single global set of initiators that can maximize influence in a given network. However, in a single dataset, there can be many coordination events and each event can have different initiators. Moreover, coordination events with different initiators might happen simultaneously. The framework we propose here aims to address the dynamics of coordination from data and is capable of inferring when each coordination event happens and who are the initiators.
2 Problem statement and analysis
2.1 Coordination without noise.
Given a collection of time series, our goal is to find multiple coordination intervals as well as their initiators. We do not assume that the coordination intervals that belong to different coordinated sets of time series are disjoint and allow overlap. We formalize various concepts of coordination and following similar to [3].
Definition 1 (Following relation)
Let and be arbitrarylength time series. If , there exists a time delay , such that , then follows , denoted as for any and if .
Definition 2 (Coordination)
Given a set of dimensional time series . The set is coordinated at time if for every pairs , there exists either or . The coordination interval is the maximal contiguous time interval such that is coordinated for every .
Definition 3 (Initiator)
Let be a set of coordinated dimensional time series within some coordination interval . Then the time series is the initiator time series for the coordination interval if for each time series , .
Definition 4 (Following network)
Let be a set of time series, a directed graph is defined as a following network, where is a set of nodes that has a onetoone correspondence to the time series set and is a set of edges, such that if .
We now extend these concepts to the case of multiple coordinated subgroups.
Definition 5 (Faction)
Given a set of time series , a subset at time is maximally coordinated, if is coordinated and there is no other coordinated set where . We call such maximally coordinated a faction at time .
Definition 6 (Faction interval)
The coordination interval of a faction or a faction interval is the maximal consecutive time interval such that is coordinated for every .
Faction is a structurally maximal subset and its interval is a temporally maximal subset.
A time series is a member of a faction if and only if it has an edge to ’s initiator .
Let a time series . Since , . By definition, there is an edge from to .
Let . If is not in , then we can add to , which will remain a coordinated set but will now violate the maximality of . Thus, .
According to Lemma 2.1, a faction is a set of nodes within such that all nodes within have a directed edge to . Note that always has the outdegree of zero and indegree of within a coordination interval.
algocf[htb]
We are now ready to formally state the Faction Initiator Inference Problem at Problem LABEL:FLIPProb.
2.2 Coordination with noise.
In the previous section, we stated the definitions and properties of the problem of identifying multiple faction initiators in the ideal setting. In this section, we provide the relaxation and the analysis of the problem in the presence of noise.
Definition 7 (following relation)
Let be a set of time series and be some similarity measure between two time series. For any pair of time series , let where represents a time series starting at time , and let . Then, for a threshold , if , then we have:

if , then ,

if , then ,

if either or and , then ( is following equivalent to ).
Note that if two time series and such that and , there exists more than one position in time that make both time series maximize their similarity.
Definition 8 (coordination)
Let be a set of time series, then is coordinated if for every pairs , either or exists.
Definition 9 (faction)
Let be a set of time series. A faction is a maximal set such that is coordinated, and there is no other coordinated set where .
Definition 10 (Relaxed faction interval)
Let be a set of time series, the time interval is a faction interval of initiator if for all , there exists a faction such that has as its initiator and .
2.3 Coordination measure
Given a set of time series , a set of clusters such that , we define a cluster membership indicator if time series and belong to the similar cluster, otherwise it is zero. The average coordination measure of a set of clusters is defined as follows:
(1) 
Note that . If is close to 1, then all time series within the same cluster are highly similar, with some time delay. This implies a high degree of coordination within each cluster in this case. On the contrary, implies no coordination, on average.
Given a set of time series containing a set of faction where , then, for all possible sets of clusters, maximizes the average coordination measure .
Proof of Theorem 1 is in the supplementary material.
3 Methods
We propose the following framework to solve Faction Initiator Inference Problem. The framework is designed to infer a set of factions, faction intervals, and their initiators from time series. Figure 1 depicts the overview of our framework.
3.1 Following network inference.
Given a set of time series and a similarity threshold , for each pair of time series , our goal is to measure whether either follows or no following relation between them exists. The time series similarity measure we need should satisfy the following properties. First, it should recognize common patterns between two time series if they exist. These common patterns can be noisy, distorted, timedelayed, and discontinuous. Second, it should infer time delay between these common patterns.
We deploy Dynamic Time Warping (DTW) [22] as the similarity measure of following relation since DTW’s warping path can distinguish whether two time series share noisy common patterns and can approximately infer the time delay of common patterns between time series. Besides, according to the work in [17], DTW performance is superior to that of other methods in detecting following among time series.
For any pair of time series , we use the equation from [3], to approximate a following relation as below:
(2) 
where is the optimal path of DTW. If , then at time is the most similar to at time . When , neither nor follows each other. We have if . In contrast, implies . The function is bound by and we set for our framework as default.
Then, a following network is constructed from where represents a vertex of time series and if . The pseudo code of following network inference is in the supplementary material.
3.2 Dynamic Following network inference.
As mentioned before, a set of time series might consist of multiple overlapping coordination intervals from many factions. Using only summary statistics or an aggregate following network cannot discover these dynamics. Hence, we need to consider each local interval and build a following network to represent the interval. Therefore, we deploy a dynamic following network scheme in our framework, which is a common technique to deal with dynamics of data[14].
The next question is “how long should each local interval be?” For now, we assume that we have a priori knowledge of the time window to capture local intervals. Later we show that we can infer from the dataset itself in Section 3.4.
We have a set of length time series as the input. We sample all time series within by sliding window intervals and create following networks of these intervals. Let be a time window and (time shift threshold), the th sliding window interval, be defined by: . For each interval , we have a set of time series . For each time series , there is such that is a subset of during time interval. We build a following network for each , then we combine these networks to be a single dynamic network. The pseudo code of the dynamic network creation is in the supplementary material.
3.3 Factions detection and coordination intervals.
For each following network , factions are network components such that all member nodes directly connect to their initiator (Lemma 2.1). We infer factions based on Definition 9 and the coordination intervals of factions are discovered based on Definition 10.
According to Lemma 2.1, initiator nodes have outgoingdegree zero, and all nodes within the similar faction directly connect to their initiator. However, due to the introduction of the time window , some nodes might not have direct edges to the initiators. Therefore, we relax the constraint of faction membership to make all nodes which have any directed path to an initiator to be members of the initiator’s faction.
Since a faction is a directed connected component where all nodes are reachable from the initiator by inverse paths, we use BreadthFirst Search (BFS) to identify all reachable nodes from each initiator node in the following network in order to find members of each faction. The pseudo code of this step is in the supplementary material.
A useful statistic about factions (used later) is the faction size ratio. Let be an induced subgraph of defined by faction , then the faction size ratio of is defined as follows:
(3) 
3.4 Time window inference.
In reality, some following relations might not be cause by explicit initiators since they either happen by chance or are due to other factors which are not related to the influence of leaders. For instance, if a follower is unable to observe a leader’s pattern, then the follower cannot be influenced by the leader. Different types of time series have different limitation of ‘observation memory’, which is the limitation of time delay such that a follower can truly observe and imitate its leader’s actions or can get commands from a leader.
Hence, to represent the concept of observation memory limitation, we set the time window to limit the length of the time delay that can measure following relations. Moreover, helps us prevent the comparison of time series between different coordination events.
Nevertheless, if we set too small, we miss inferring some following relations that have . On the contrary, longlength causes false positive matching between repeated patterns of different coordination intervals. Therefore, a proper should be able to infer a higher number of true following relations than any . Even if some random following relations might appear when we choose instead of
, this is not an issue. Since these random following relations appear by chance and with lower probability, they have a relatively small effect on the number of following relations.
In our framework, without the knowledge of , we use that maximizes the average coordination measure (Eq. 1). Given a dynamic following network based on the time window , for each time step , we calculate by designating each faction to be a cluster and creating the last cluster for all time series, which are not in any faction. Then, is computed from the median of . is used to be a representative coordination measure value of . Hence, the optimal is computed as follows:
(4) 
3.5 Leadership comparison.
There are several methods that are widely used for ranking important nodes within the graph. One of the wellknown methods that consider the higherorder relation within a graph is PageRank [20]. In our approach, we deploy PageRank on the following network to rank individuals within each faction and report the rank ordered lists for each time step. Even though PageRank scores are computed from the entire network, we compare individuals’ ranking score only within the same faction and create a rank order list for each faction. For each node within a following network , the PageRank score is defined below:
(5) 
where is a rank value of node , is a damping factor, which is typically set at 0.9, is a set of ’s followers, is a set of individuals follow, and is an element of adjacency matrix of a following network where follows if .
4 Evaluation Datasets
4.1 Leadership models.
The evaluation of the framework is conducted based on four models of coordination mechanisms.
4.1.1 Dictatorship Model.
The Dictatorship Model (‘DM’) [3] is considered to be the simplest model in the leadership realm. Initially, no movement happens until the leader starts moving to a target, then individuals follow their leader with some time delay until the entire group is coordinated in both direction and velocity. Then, the group gradually stops at the target and starts moving again to the next target.
4.1.2 Hierarchical Model.
The Hierarchical Model (‘HM’) [3] is another variation of DM with the hierarchical condition. The hierarchical condition assigns a rank to each individual within a group. A leader has the highest rank. The lowrank individuals follow highrank individuals with some time delay. In our evaluation model, we assign a linear order hierarchical condition such that is a leader and is followed by . The group moves linearly along the line with some noise, following its leader.
4.1.3 Independent Cascade Model.
The Independent Cascade Model (‘IC’) [16] is one of the influence propagation models in Social network analysis. Initially, everyone has a probability to be activated . Active individuals move toward their leader. For each time step, each active individual simultaneously and independently attempts to activate nearest inactive neighbors around it with the probability of success . If success, the inactive individual becomes active at the next time step. Active individuals cannot attempt to activate the same individuals again. Only the leader follows its target and everyone else follows the leader. We explore the parameter space on combinations of: and .
4.1.4 Crowd Model.
In the Crowd Model (CM) [28]
, there are two types of individuals: informed and uninformed individuals. For each time step, informed individuals move toward the target independently while uninformed individuals keep staying close to both group’s position and direction centroids. Therefore, the group direction is implicitly influenced by informed individuals. For each coordination, all informed individuals follow a single target direction vector, while the rest of the group keeps staying with the majority.
4.2 Synthetic trajectory simulation.
We generate time series datasets based on the models described above. For each dataset, it consists of 30 individuals’ time series of coordinates. Each time series has a length of 4,000 time steps. A coordination event consists of multiple faction intervals, described below. We have five coordination events for each dataset. For each model above, the coordination event can be divided into two types.
4.2.1 Linear coordination event.
There are four factions for each coordination event. The first faction has as a leader and others are followers. This faction lasts for 200 timesteps. The next faction is lead by and its coordination interval is . The third faction appears within interval and it has as a leader. In the last faction, leads the group to stop moving and the group completely stops moving around time step . Everyone stops moving within , then the group proceeds to the next coordination event again.
4.2.2 Splitting/Merging coordination event.
In this type of coordination event, splitting and merging of factions happens. Within the interval, leads a single faction with its direction vector. Then, at , the group is split into three factions and they appear within interval. The first faction is lead by and about a third of the previous faction members are followers (Fig 2 below). The has its own direction vector. leads the second faction with another one third members from the previous faction. has a different direction from . Lastly, leads the rest of the individuals. also has its own direction, which is different from ’s and ’s.
At , the factions lead by and are merged into the faction of ; and follow the ’s direction. At the interval, leads all the individuals. Finally, leads the faction to stop moving between and . The group completely stops at the interval. Note that leaders in each faction are informed individuals in the Crowd Model. Instead of having only one leader for each faction, we have three informed individuals in the Crowd Model.
For each leadership model and its coordination event type, we generated 100 datasets. In total, each model has 200 datasets except IC, for which we explore all nine possible combinations of parameters. In total, we have 1,800 datasets for the IC model.
4.3 Biological datasets
4.3.1 Baboon trajectories.
The dataset comes from the set of GPS collars on a troop of wild olive baboons (Papio anubis) at Mpala Research Centre, Kenya [7, 25]. The data consists of time series of latitudelongitude location pairs for each individual every second. individuals whose collars remained functional throughout the time are analyzed for a case study of a merging coordination event.
4.3.2 Fish schools trajectories.
The fish dataset is a set of time series of fish positions from a video record of a school of golden shiners (Notemigonus crysoleucas). The record is used to study information propagation over the visual fields of fish [24]. Each trial contains fish, with fish who trained to lead the group to the feeding sites. The dataset has separate ground truthed leadership events. The task is to correctly identify trained fish.
5 Evaluation criteria
For each simulation dataset, we have the ground truth of an individual’s membership in a faction and the identity of the faction’s leader. We compared the inference result from each method against the known ground truth to evaluate the method’s performance.
5.1 Individual assignment.
For all models, for each time step, the accuracy of the individual assignment is the number of inferred individuals’ factions that agree with the ground truth, divided by the total number of individuals. Note that, in the Crowd Model, each faction has a set of informed individuals and individuals belong to if they follow any informed individual in .
5.2 Leadership prediction.
For all models except the Crowd Model, the true positive TP is the number of inferred leaders who are indeed the ground truth leaders. The false positive FP is the number of inferred leaders who are not the actual leaders. The false negative FN is the number of actual leaders who are inferred to be nonleaders. In the Crowd Model, TP is the number of inferred leaders who are informed individuals from the right faction. FP is the number of leaders who are uninformed individuals. FN
is the number of ground truth factions such that all informed members are nonleaders. We calculated F1Score to estimate the performance of the leadership prediction for each framework.
6 Results
6.1 Leadership Identification.
Leadership F1score  Assignment Acc.  
Dataset  mFLICA  FLOCK  mFLICA  FLOCK 
DML  0.94  0.92  0.89  0.86 
DMMS  0.94  0.91  0.86  0.84 
HML  0.94  0.91  0.94  0.86 
HMMS  0.95  0.90  0.86  0.81 
ICL  0.91  0.86  0.86  0.80 
ICMS  0.89  0.85  0.79  0.79 
CML  0.82  0.64  0.83  0.64 
CMMS  0.75  0.67  0.64  0.55 
For each simulation model in Section 4.1, we evaluated results from all datasets using the criteria in Section 5. We set time window by the method from Section 3.4 and set time shift . The results of faction assignments and leaders identification are in Table 1. Each row with the label ‘L’ is a model with Linear coordination event type (Section 4.2.1) and ‘MS’ represents a model with Splitting/Merging coordination event type (Section 4.2.2). The 2nd and 3rd columns represent the results of leadership prediction F1Scores of mFLICA (our proposed framework) and the modified FLOCK framework [27, 4], and the values in these columns are calculated from the median of all datasets from a given leadership model. The 4nd and 5rd columns represent individual assignment accuracy results. We took the median of all givenmodel datasets to represent each model accuracy. Unsurprisingly, mFLICA beat FLOCK in all models. The result implies that the simple framework like FLOCK has a limitation when it needs to deal with complicated noisy leadership models.
Top3 Rank Order Accuracy  

Dataset  mFLICA  FLOCK 
HML  0.75  0.78 
HMMS  0.72  0.76 
In hierarchical models, we reported the result of top 3 rank order inference accuracy within each faction in Table 2. The table rows represent leadership model datasets. The columns are accuracy, which determined by the percentage of top individuals from the ground truth appear in the list of top inferred list. Even though mFLICA has a competitive results, the FLOCK framework performs better, which makes sense since the hierarchical model has a linear hierarchy structure and the leader is always in the front of the group’s direction, which matches the fundamental assumption of FLOCK.
6.2 Case study: trained leaders in fish schools.
Trained fish  Trained fish  
Method  factions  leaders 
mFLICA  0.90  0.88 
FLOCK  0.37  0.27 
We considered any fish within the faction of a trained fish to be following the trained fish. Among 24 trails of fish movement, the medians of inference accuracy of a fish following the trained fish are in column 2 in Table 3. We also measured the accuracy of inferred initiators being the trained fish in each trial (column 3 in Table 3). According to the results in Table 3, mFLICA performs significantly better than FLOCK in both aspects. This is because the fish datasets are tremendously noisy, and the DTW in mFLICA is more robust to the noise than the simple FLOCK model [17].
6.3 Case study: detecting the group merging event of baboons.
We used a baboon dataset (see Section 4.3.1) to demonstrate an example application of our framework to find transitions of coordinated events in real datasets. We focused on the dataset during the period when the merging of two groups happens on Aug 3, 2012, 08:49:01 AM. The length of the trajectories is 500 seconds. Figure 3 illustrates the result when the merging happens. Before time , a faction lead by (black node) starts moving in the same direction as the faction lead by (purple node). The process is measured by the Faction size ratios (Eq. 3) of both factions, which increase over time. After , faction is merging with ’s faction to become a single faction at . After merging, because the faction of gains more members, its Faction size ratio (Eq. 3) increases. Hence, by observing Faction size ratios lead by each individual, we can find merging events (or spiting events).
7 Discussion
In this paper, we formalized the Faction Initiator Inference Problem and provided an endtoend general, unsupervised framework as the novel solution that can be used to study a wide range of coordinated activities. The framework is competitive against a nontrivial baseline method in both simulated and realworld datasets. Moreover, we demonstrated that the framework can be used to identify merging events as well as factions and initiators at each time step in biological datasets. This example implies that our framework opens opportunities for scientists to ask questions about coordinated activities and is able to create scientific hypotheses and test them. Our framework is powerful and almost parameter free (we need only a similarity threshold and time shift parameter). The scalability bottleneck is the DTW method used to compare time series. The existing DTW lower/upper bound techniques cannot be applied directly in our case since they only compute the distance between time series and not the actual wrapping path needed in our framework. With simpler and faster similarity computation, our framework can become highly computationally scalable. In the future, such more scalable approaches should be investigated. Another future work we plan to explore is the causality inference, which is closely related to leadership inference in the sense that initiators cause their followers’ actions. We are planning to report the Granger causality results for leadership inference in our next paper. The code, datasets, and supplementary files that we used in this paper can be found at [1]. The new mFLICA code is in the form of R package [2].
References
 [1] mFLICA: code and supplementary. https://github.com/CompBioUIC/MFLICA. Accessed: 20171219.
 [2] C. Amornbunchornvej. mflica: An r package for inferring leadership of coordination from time series. arXiv preprint arXiv:2004.06092, 2020.
 [3] C. Amornbunchornvej, I. Brugere, A. StrandburgPeshkin, D. Farine, M. C. Crofoot, and T. Y. BergerWolf. Flica: A framework for leader identification in coordinated activity. arXiv preprint arXiv:1603.01570, 2016.
 [4] M. Andersson, J. Gudmundsson, P. Laube, and T. Wolle. Reporting leaders and followers among trajectories of moving point objects. GeoInformatica, 12(4):497–528, 2008.

[5]
A. Y. Carmi, L. Mihaylova, F. Septier, S. K. Pang, P. Gurfil, and S. J.
Godsill.
Inferring leadership from group dynamics using markov chain monte carlo methods.
In Modeling, Simulation and Visual Analysis of Crowds, pages 325–346. Springer, 2013.  [6] I. D. Couzin, J. Krause, N. R. Franks, and S. A. Levin. Effective leadership and decisionmaking in animal groups on the move. Nature, 433(7025):513–516, 2005.
 [7] M. C. Crofoot, R. W. Kays, and M. Wikelski. Data from: Shared decisionmaking drives collective movement in wild baboons, 2015.
 [8] J. R. Dyer, A. Johansson, D. Helbing, I. D. Couzin, and J. Krause. Leadership, consensus decision making and collective behaviour in humans. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 364(1518):781–789, 2009.
 [9] L. Glowacki and C. von Rueden. Leadership solves collective action problems in smallscale societies. Phil. Trans. R. Soc. B, 370(1683):20150010, 2015.
 [10] A. Goyal, F. Bonchi, and L. V. Lakshmanan. Discovering leaders from community actions. In Proceedings of the 17th ACM conference on Information and knowledge management, pages 499–508. ACM, 2008.
 [11] A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning influence probabilities in social networks. In Proceedings of the third ACM international conference on Web search and data mining, pages 241–250. ACM, 2010.
 [12] X. He and D. Kempe. Robust influence maximization. In Proceedings of the ninth ACM SIGKDD, pages 1–10. ACM, 2016.
 [13] M. A. Hogg. A social identity theory of leadership. Personality and social psychology review, 5(3):184–200, 2001.
 [14] P. Holme. Temporal networks. Springer, 2014.
 [15] D. M. Jacoby, Y. P. Papastamatiou, and R. Freeman. Inferring animal social networks and leadership: applications for passive monitoring arrays. Journal of The Royal Society Interface, 13(124):20160676, 2016.
 [16] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD, pages 137–146. ACM, 2003.
 [17] M. B. Kjargaard, H. Blunck, M. Wustenberg, K. Gronbask, M. Wirz, D. Roggen, and G. Troster. Timelag method for detecting following and leadership behavior of pedestrians from mobile sensing data. In Proceedings of the IEEE PerCom, pages 56–64. IEEE, 2013.
 [18] J. Krause, D. Hoare, S. Krause, C. Hemelrijk, and D. Rubenstein. Leadership in fish shoals. Fish and Fisheries, 1(1):82–89, 2000.
 [19] T. W. Malone and K. Crowston. The interdisciplinary study of coordination. ACM Computing Surveys (CSUR), 26(1):87–119, 1994.
 [20] L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 199966, Stanford InfoLab, November 1999.
 [21] H. Pham and C. Shahabi. Spatial influence  measuring followship in the real world. In ICDE16, pages 529–540, May 2016.
 [22] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing, 26(1):43–49, 1978.
 [23] J. E. Smith, J. R. Estrada, H. R. Richards, S. E. Dawes, K. Mitsos, and K. E. Holekamp. Collective movements, leadership and consensus costs at reunions in spotted hyaenas. Animal Behaviour, 105:187–200, 2015.
 [24] A. StrandburgPeshkin and et al. Visual sensory networks and effective information transfer in animal groups. Current Biology, 23(17):R709–R711, 2013.
 [25] A. StrandburgPeshkin, D. R. Farine, I. D. Couzin, and M. C. Crofoot. Shared decisionmaking drives collective movement in wild baboons. Science, 348(6241):1358–1361, 2015.
 [26] S. Stueckle and D. Zinner. To follow or not to follow: decision making and leadership during the morning departure in chacma baboons. Animal Behaviour, 75(6):1995–2004, 2008.
 [27] T. E. Will. Flock leadership: Understanding and influencing emergent collective behavior. The Leadership Quarterly, 27(2):261–279, 2016.
 [28] S. Wu and Q. Sun. Computer simulation of leadership, consensus decision making and collective behaviour in humans. PloS one, 9(1):e80680, 2014.
 [29] C.H. Yu, J. Werfel, and R. Nagpal. Collective decisionmaking in multiagent systems by implicit leadership. In AAMAS’10, pages 1189–1196, May 2010.
Supplementary
Coordination measure
Given a set of time series , a set of clusters such that , we define a cluster membership indicator if time series and belong to the similar cluster, otherwise it is zero. The average coordination measure of a set of clusters is defined as follows:
Note that . If is close to one, then all the time series within the same cluster are highly similar, with some time delay. This implies there exists a high degree of coordination within each clusters in this case. On the contrary, implies no coordination, on average.
Given a set of time series containing a set of faction where , then, for all possible sets of clusters, maximizes the average coordination measure .
Reminding that for all pairs within any similar faction , . Hence, .
Case 1: let , if we modify by exchanging any time series with and call it , then we have:
For any , since . In contrast, because , then , which implies . for a similar reason. Therefore, .
Case 2: if we create from by spiting a cluster to be and , then we have:
Case 3: we create from by merging any cluster with any such that to be . So, let
then
By merging and , we introduce pairs of time series across and to Equation 1 such that since these pairs are not belong to the same faction. These pairs decrease the average of , which implies .
Since we shown that no matter how we edit , the average coordination measure cannot increase, therefore, maximizes the average coordination measure.
Time complexity
Let be a number of time series, be a time window, be a shifting factor (we use ), and be a total length of time series. By deploying DTW Sakoe Chiba band technique [22] setting as a band limitation, the time complexity of computing a following network is . Since we need warping paths, not a distance, the upper/lower bounds tricks which are used to speed up DTW found in the time series literature cannot be applied here. The number of following networks we need to compute is . In total, the time complexity of our framework is . Additionally, we might explore candidates of in order to find the optimal . Since is a constant, the asymptotic time complexity of our framework also remains the same. This expensive cost is unavoidable and it makes our framework hard to be a scalable framework.
Comparison method
From the best of our knowledge, there is no existing methods dealing with the Faction Initiator Inference Problem. The closest method that we can compare against is the flock model [27, 4]. We compared our framework against Volatility Collective Behaviors Model [27], which has an assumption that all members in a similar group move toward the similar direction on a nonlinear trajectory. Hence, we modified the FLOCK framework to make it work in our setting as a baseline of comparison. In stead of using DTW to build following networks, we created FLOCK following networks. According to the work in [4], the time series follows the time series at any time step if the angle of their direction vector from time to is less than the threshold as well as is in the front of with respect to ’s direction, as well as and must have their distance less than the threshold . The FLOCK following networks are built for all time steps. The rest of FLOCK framework is similar to our framework. We set the FLOCK parameters such that it can perform the best.
Centrality measures in multifaction datasets
In this section, we explore the use of centrality measures to infer faction initiators. We used 200 simulated datasets from the dictatorship model to conduct the analysis. For each dataset, we created a global static following network and used centrality measures on this network. In each dataset, we have 30 individuals and four of them are initiators.
Centrality methods  

Event types  PageRank  INDegree  Closeness 
Linear  0.85  0.84  0.54 
Merge/split  0.64  0.67  0.53 
The Jaccard similarity result between the top4 ranking individuals from the centrality measures and the ground truth set of four initiators is in Table 4. PageRank and INDegree centrality perform well in dataset containing the simple linear coordination events while closeness centrality performs the worst. This is because initiators in this setting are supposed to have a higher number of followers than noninitiator individuals, which implies the higher ranking w.r.t. PageRank and InDegree centrality. In contrast, initiators are not necessary close to their followers in the network, which made closeness centrality perform poorly. For the datasets that contain merge/splitcoordination events, since there is a complicated dynamics of interactions among the factions, the simple centrality measures fail to capture the true initiators altogether.
Centrality methods  
Initiator’s ID  PageRank  INDegree  Closeness 
ID1  1  1  0.92 
ID2  1  1  0.66 
ID3  1  1  0.36 
ID4  0.39  0.37  0.20 
Table 5 illustrates the result of supports of four initiators being in the list of individuals ranked top4 by the centrality measures in linearcoordination datasets. Similarly, PageRank and InDegree centrality perform well, while closeness centrality perform poorly. For the merge/splitcoordination datasets, Table 6 shows that all centrality measures perform poorly to infer ID2 and ID4 initiators while they perform well to include ID1 and ID3 in their top4 ranking lists. This is because ID1 and ID3 spent significantly more time leading their factions than ID2 and ID4.
Centrality methods  

Initiator’s ID  PageRank  INDegree  Closeness 
ID1  1  1  1 
ID2  0.29  0.47  0.20 
ID3  1  1  0.83 
ID4  0.28  0.19  0.08 
In conclusion, these results emphasize the need of a dynamic following network approach to deal with the complicated problem of inferring the initiator of a faction.
The pseudo codes
algocf[htbp]
algocf[htbp]
algocf[htbp]
Comments
There are no comments yet.