1 Introduction
Leadership is an important aspect of the social organization, formation, and decisionmaking of groups of people in online and offline communities, as well as other social animals. Understanding the dynamics of emerging leadership allows researchers to gain insights into how social species make decisions. Until recently, it has been difficult if not impossible to pinpoint the identity of a leader from available observational data without explicit additional information. However, the availability of data from physical proximity sensors, GPS, and the web opens up the possibility of measuring leadership in online activities, facetoface interactions, animal populations, and aggregate social processes such as economic activity. This paper presents an automated method for unsupervised identification of leader identity in the context of successful initiation of coordinated activities among groups of individuals. The method uses only the data on the time series of individual activities, with no additional information. The proposed approach automatically determines (1) when a group decision was made, (2) the identity of the leader, and (3) the mechanism by which the group agreed to follow the leader.
Previous work over several domains defines leadership according to physical movement, in public spaces [31], location based social networks [26], physical association patterns [22], and other physical trajectories [1]. Leadership has also been studied in online social networks [12], where user actions are imitated over the network topology. Much of this work has focused on identifying leaders from dyadic interactions, but little work has focused on measuring leadership in coordinated group activity, which occurs in group decisionmaking and collaborative systems. Under this view of leadership, a leader is simply the individual who successfully initiates the coordinated activity of a group, followed by other individuals. Moreover, most of the previous work does not explicitly focus on the time when the decision is made and leadership is manifested, (i.e. the period of a group’s transition to the coordinated activity). Finally, many of the previous approaches assume a particular model of leadership, such as influence maximization, whereas here we present a framework to differentiate between alternate models of leadership.
1.1 Our Contributions
We propose a general, scientifically grounded, unsupervised, modular, and extendable framework with few assumptions for identifying individuals who lead a group to a state of coordinated activity. Our framework is capable of:

Detecting events of coordinated activity: discovering time intervals of coordination in group activity data and the transition periods of decisionmaking which lead to that coordination;

Identifying leaders: identifying the initiators of this coordinated behavior, the individuals who succeeded in leading the group to coordination; and

Classifying the group leadership model: characterizing the type of the group’s transition behavior to coordination according to interpretable, dynamic models (e.g. hierarchical, dictatorial).
We demonstrate the framework’s ability to analyze leadership in coordinated activity on synthetic and real datasets over several domains. We use synthetic simulated data to validate every aspect of the framework. We use two biological datasets – GPS tracks of a baboon troop and videotracking of fish schools, – as well as stock market closing price data of the NASDAQ index. The results are consistent with groundtruthed biological data and the framework finds many known events in financial data which are not otherwise reflected in the aggregate NASDAQ index. Our approach is easily generalizable to any coordinated activity data from interacting entities.
1.2 Related Work
Coordinating patterns of individual activity is a challenge that all social organisms face, and diverse strategies–from democratic to dictatorial–have emerged to allow members of groups to reach consensus. Leadership (defined as nonrandom, differential influence [30]) plays a key role in organizing the collective (i.e. group) behaviors of social organisms ranging from humans [9] to hyaena [29] to hymenoptera [25]. It potentiates complex patterns of cooperation and conflict (e.g., lions [15], hyaenas [4], meerkat [23], chimpanzees [10], humans [11]), organizes group movements (fish [7], humans [9], dogs [3]), and may prevent freeriding [24, 17].
Substantial interest currently exists in identifying individuals who act as leaders and determining how they influence the behavior of others in their social network. Most previous computational work creates global, static leadership ranking over the entirety of the input data [12, 38, 31, 18, 2]. This assumes that leadership relationships are global and fixed across time. However, the important initiators of group activity are not necessarily the individuals found at the top of their group’s social dominance hierarchy [32, 39, 5, 33]. Our framework explicitly identifies heterogenous, dynamic leadership ranking by identifying local time intervals of leadership and measuring rankstability over time.
While domaindriven leadership models typically measure pairwise dyadic dominance or following interactions in the absence of an explicit network structure [1, 26, 21]
, most of the work in machine learning is on explicit, known network topologies
[37]. Our framework generalizes to either an explicit network or hidden implicit dynamic network topologies inferred over multidimensional time series data. Furthermore, reporting only dyadic leadership relationships does not incorporate conditional dependencies over the group (e.g. interactions are assumed as independent) [20]. To mitigate this, previous work on leadership incorporates highlevel network measures [18] including PageRank and HITS, or cascadesize [2]. Our framework can use any ranking function, such as PageRank, and can be extended to any highlevel measure on our inferred ‘following’ network.From a social network perspective, leaders can be characterized as influential individuals who have many followers that imitate the leader’s actions [12], and thus are able to successfully take a group from one behavioral state to another. Significant attention has been paid to the problem of influence maximization (IM)–i.e. how individuals in a specific community are able to maximize their impact on the behavior of the community as a whole [19, 13]. Recently, these IM models, as well as more general domaindriven definitions of leadership have been shown to have considerable instability in the presence of noise often found in realworld datasets [14]. Our framework is general, allowing to test multiple leadership models and requires no parametric assumptions for the definition of following.
The ability to identify leaders based on their behavior and the subsequent reactions of others opens opportunities to explore how an individual affects group behavior and how group decisions are made. However, because leadership can occur in a multitude of contexts and take diverse forms, any generalized framework for identifying leaders or testing models about the underlying decisionmaking process that leads to group consensus must allow for domainspecific behavioral features. Further, a framework for testing among different models of leadership and consensusbuilding would pave the way for a more generalized understanding of collective behaviors and how they shape disparate social systems. Unfortunately, no such framework for leadership model classification exists. Here, we tackle this problem for the first time, focusing on developing de novo methods for both leadership identification and leadership model classification.
Symbol  Definition 
Mean of the signed index difference of a optimal warping path :  
Density of graph G,  
a coordination event, the set of
all coordination events identified by the framework, 

,

a rankorder of some measure on .
the ‘global’ rank order over all . 
Kendall rank correlation comparison over ,  
The support of individual , , (fraction of firstranked over intervals)  
Model Parameters  
Timeseries window size , overlapping window shift size , Dynamic Time Warping () warping band , and densitycoordination threshold 
2 Methods
Our proposed framework measures ‘following’ relationships in multidimensional time series of an arbitrary cardinality, and constructs a network model to rank the leaders before and at the time of coordination. Table 1 summarizes all symbols and parameters introduced below.^{1}^{1}1
Matrices, vectors, and sets are denoted by capital letters, individual scalars are denoted with lowercase,
parameters are denoted with Greek letters.Figure 1 gives a highlevel overview of our proposed framework. The framework takes as input a collection of multidimensional time series data. The framework first (a) computes timeseries measures appropriate for ‘following’ relationships (e.g. Dynamic Time Warping, see Section 2.2.1) over sliding windows, and (b) uses these associations to construct a sequence of directed ‘following’ graphs. We (c) model time intervals of coordination using the density over time of the graph sequence. For these intervals, we (d) apply our set of leadership ranking measures (e.g. PageRank) to capture different aspects of leadership. Finally, we output these ordered rankings.
2.1 A working example
Figure 2 presents a key example and a brief introduction to our framework, on real GPS trajectory data of olive baboons (Papio anubis). This event was validated with video taken onsite (see: Section 3.3.1). Figures 2(a)2(c) show the GPS locations of baboons over three different time steps (). These figures also show the directed ‘following’ network, and the PageRank of individuals at that time step (by node size scaling).
The middle plot shows the density of the ‘following’ network over the duration of the entire event. The dotted red lines denote different intervals of this event, based on network density over time.The increase in density corresponds to the transition from uncoordinated to coordinated movement and the interval of high density between the dotted lines corresponds to coordinated group movement. The top figure presents the PageRank of individuals in the following network over a coordination event.
Figures 2(a)2(b) show the initiation of movement of the group by ID3 (Black). Figure 2(c) shows the ‘following’ network now in the coordination interval. Individual ID3 has the largest weight in the first two snapshots, and the PageRank of individual ID1 (Blue) surpasses ID3 only after the network is ‘coordinated’ (e.g. moving together). If we measure the leadership ranking after network density is high, we miss that ID3 ‘built’ the network in the precoordination interval (to the left of the first dotted red line).
2.2 Time series analysis
A multidimensional time series is a tuple of cardinality of timeordered sequences of observations of length :
(1) 
Our input dataset contains time series of fixed cardinality (any ). Each represents the activity of an entity (e.g. user, individual). The total size of is then given by . In the case of typical geospatial trajectories, , for latitude and longitude.
2.2.1 Time series measures and sliding windows
Our framework constructs a directed association network by measuring following interactions between time series. The definition of the ‘following’ relationship is the atomic unit which determines our network topology, and the subsequent leadership analysis.
We focus on Dynamic Time Warping () [27]. However, any appropriate local measure of timelagged similarity may be used. Dynamic Time Warping is an optimal elastic matching between sequences using dynamic programming and is regarded as “remarkably hard to beat as a time series distance measure, across a host of domain applications, and a host of tasks; including clustering, classification and similarity search” [28]. Figure 3 (Left) shows two trajectories, where timeshifting ahead in time produces a better match to , illustrated in the warping path in Figure 3 (Right). For the multidimensional generalization of , we use (with standing for ‘dependent’ [28]). To compute the distance between multidimensional observations at cell in the dynamic programming matrix, simply uses the Euclidean distance over the dimensions:^{2}^{2}2 We use ‘*’ subscript notation in matrices to indicate slicing in the dimension(s).
(2) 
In practice, uses a warping band which constrains the difference in time between matched observations such that . Aside from reducing computation from to , this enforces domain knowledge of what constitutes a ‘coherent’ match for our ‘following’ relationships of interest. We discuss parameter selection in Section 2.6.2.
2.2.2 Associations in time series sliding windows
Dynamic time warping is typically applied as a global matching measure. However, we make a Markovian assumption to identify ‘following’ on time series subsequences, which in aggregate may not correspond to the global optimal warping solution (see Section: 2.6.1 for discussion).
For a pair of time series, a time series window size and a sliding window stepsize (see Section 2.6.2), we calculate on the time series subsequences defined on these overlapping windows. The th window is an interval given by: . then outputs an optimal warping path on the time series subsequence pair (, ). We denote this path by , represented as a sequence of index pairs , see: Figure 3 (Right). We compute the mean of the signed index difference on this index pairs sequence:
(3) 
This function measures the extent of warping between two time series. For time series which cannot be warped oneontotheother, . When is positive, follows , as shown in Figure 3 and when negative, follows . This function is bounded by [1, 1].
2.3 Dynamic association network: inference and analysis
We construct a timevarying, directed association network on the multidimensional time series dataset for sliding window steps . Let be a network with nodeset of size , and edgeset of size . The nodes are the entities represented by each time series . For pairs of time series, we construct a sequence of edgesets using Equation 3 on valid windows bounded by time series length . Nonzero define directed edges at time , between nodes associated with and . Although here we compute the pairwise association network between all time series pairs, our framework trivially generalizes to the case where an explicit network is given, where we measure ‘following’ relationships only for the time series pairs associated with adjacent nodes in that network.
2.3.1 Detecting intervals of coordination
Recall that our framework identifies time intervals of coordinated activity and measures the leader who initiated that coordination. Once the group’s actions are already coordinated, we are ‘too late’ to observe the initiation. Our dynamic ‘following’ network captures pairwise following activity. Therefore, network density, is a simple measure of group coordination with the fewest assumptions on the structure of following.
Figure 4 illustrates the definition of a coordination event as a pair of time intervals. We apply a threshold to the network density time series , over varying . We set adaptively, according to the distribution of by taking as the threshold value either the mean, median, or another percentile of the density distribution. A contiguous time interval above the threshold defines a coordination interval, and the preceding interval below is a precoordination interval. For the precoordination interval, the trend of network density is assumed to be monotonically nondecreasing. Therefore, we determine the beginning of the precoordination interval as the first timestep prior to the coordination interval where the discreet derivative (e.g. difference) is zero: .
Together, these intervals are one coordination event, represented by the 3tuple of time indices (sharing index ). The collection of coordination events is a set . The total interval of the event, are nonoverlapping in , and denotes the total number of 3tuples. For the remainder of our framework, we measure leadership only on these events in set . To reduce the number of intervals which might be generated near the threshold , we apply a greedy merging of nearby coordination intervals (taking the range from the warping band window ).
2.4 Leadership ranking
In this section, we propose several methods for measuring different aspects of leadership by comparing higherorder network features against individual time series features. These feature spaces give an extendable way to compare different aspects of leadership in the absence of a single, unambiguous leadership definition.
In all of the below analysis, we compare rank ordering, denoted , as the sorted position from best to worst of nodes on some measure function, i.e. .^{3}^{3}3 We use ‘’ to denote placeholders, e.g. for parameters.
2.4.1 PageRank
PageRank [6]
is a standard method for approximating eigenvector centrality in networks, designed to measure the importance of a node by the number and the importance of the other nodes linking to it. In a network where a link represents a following relationship between nodes, PageRank measures how many other nodes follow a given node and how many followers do those nodes have, etc. Thus, it fits well with our definition of a leader.
PageRank returns a weight vector of length , with a sum of 1. We calculate PageRank for each static graph within the dynamic graph sequence of the precoordination interval, and let be a sequence of length PageRank vectors.
While the definition of a time step in the original time series domain and the ‘following’ network may be different, the precoordination interval is precisely defined in both domains. We aggregate the rankings over the entire precoordination interval and produce one rank value per individual for the entire interval.
2.4.2 Velocity Convex Hull
The velocity convex hull measures the frequency with which the discrete time series derivative () associated with a node is outside the bounds of the population’s discrete derivative distribution (including node ) in the previous time step. In aggregate, a high rank of this measure indicates which node first moves in the group. In the case of spatiotemporal trajectories, this measure corresponds to how often an individual’s velocity at a given time step is outside the range of velocities that were present in the group at the previous time step.
The convex hull can be computed on arbitrary dimensions of a multidimensional time series, or their derivatives, jointly or independently. The convex hull function returns an dimensional surface represented as lines between points in the input data, which encompass all other points. Because we look at velocity jointly in the one dimensional case, we can directly use the max and min.
Let be a sized matrix measuring individual velocity over time, on time series dataset . For an individual at timestep , we define the following indicator function:
(4) 
For time step we output an length rank order vector as .
2.4.3 Position Convex Hull
The position convex hull is analogous to velocity, except that our indicator function measures an individual’s position relative to the convex hull containing the population at the previous time step. Rather than look at velocity of initiation, this measure captures an individual’s frequency of moving outside the geometric boundaries of the group, and close to the average heading of the group (e.g. in ‘front’ of the group).
We compute the convex hull function on timestep , = , and also introduce the heading vector of individual : , and the population heading vector: = . We define the function to denote standard ‘B contains A’ spatial queries between two geometry objects, and to measure the angle between two vectors and .
Using these definitions, we define the position convex hull indicator function for individual at time :
(5) 
For time step we output an length rank order vector as .
2.5 Leadership comparison
In Section 2.3.1 we described how we detect coordination events. We now describe how we apply the measures described in Section 2.4 on the collection of detected coordination events to determine leader identity.
2.5.1 Leadership support
Recall that denotes the coordination events discovered by our framework, and is the total number of events. We aggregate rankings across a single coordination event . For node we calculate the mean rank over all timesteps in ’s precoordination interval
. We then rank nodes by this value. Note that by design, the mean rank is affected proportional to the distance from the mean (e.g. outlier values), unlike the mode (e.g. the node’s most likely ranking). In practice, ranking has considerable noise in the local ordering, so ranking by expectation is more robust.
We also define a ‘global’ rank order by this same procedure, combining all rankorder vectors over all coordination events in , and computing the mean rankordering. Let this ‘global’ rank ordering be defined as for PageRank, for velocity convex hull, and for position convex hull.
For all , and node , Leadership Support is defined relative to a particular measure (e.g. PageRank) for as the fraction of intervals where is first ranked:
(6) 
To measure the global leader in our framework, we use PageRank in the definition of maximum support over all : .
2.5.2 Comparing rankorders
We use the Kendall rank correlation coefficient to compare local and global rankorders. This measure provides a similarity between two rankings according to their ordinal agreement over all listpairs (e.g. is “below” in both lists).
To compare global and local rank orders, we use the mean Kendall rank correlation over all coordination events against the global:
(7) 
For example, compares local and global velocity convex hull rank orders.
Similarly, we compute the mean Kendall correlation between local rankings associated with different measures (e.g. velocity convex hull, position convex hull):
(8) 
Equation 7 formalizes our intuition that leaders consistently move outside of the spatial extent (), or the distribution of velocity over the population (). By comparing the global vs. local correlation in rank ordering, we measure the stability of the global ranking is over time.
Equation 8 measures the relationship between higherorder graph structure (centrality) and simple time series features. Using this measure, we can gain a better understand the highlevel aspects of initiating coordination. For example, we see whether changing velocity (), or position () within the group is correlated with network rank position.
2.6 Framework discussion
2.6.1 Local vs. Global Matching
Our proposed framework uses local alignment on time series subsequences, rather than global alignment on the full time series. Figure 5 presents a motivation for this choice. Suppose we intend to match sparse ‘following’ events represented as the pair of spikes with relatively low magnitude at the end of the red and blue time series. In Figure 5(a), the time series is shifted to match one of the two patterns, depending on the cost. This forces a mismatch of the ‘following’ event. Similarly, Figure 5(b) has a low cost matching by shifting the entire time series at a constant rate. By matching only local subsequences, we can recover both of these ‘following’ events.
2.6.2 Parameter selection
Although the proposed framework has four parameters, the dynamic time warping band , and the overlapping window shift size are minor parameters that primarily trade off computation versus the sampling rate of the time series process. For a window size , the DTW subprocedure is computed in , where yields the traditional quadratic runtime. Although the warping band reduces computation, it also enforces domain knowledge of an appropriate timelag. The framework is multiplicative in the number of individualpairs, , the number of time windows , and the cost of . Therefore, the total cost is , which can be reduced by increasing or decreasing .
The parameters defining the dynamic network timescale and the threshold of coordination events are the major parameters of the framework. We tune
according to the TWIN heuristic
[36] on network density. TWIN is an informationtheoretic heuristic which discovers natural time scales in dynamic networks data. For , we explore different percentiles on the density distribution.3 Evaluation Datasets
We evaluate our framework on four synthetic trajectory models and three real datasets.
3.1 Simulation models
We develop synthetic trajectory models which capture several hypotheses of movement with respect to leadership. These models are not intended to accurately reproduce trajectories in our domain, nor are they exhaustive of different properties of interest. Comparing real data against model simulations provides interpretable characterization of these datasets. We propose four synthetic models: dictatorship, hierarchical, linearthreshold, and fixeddestination (random) models. For evaluation, we attempt to identify the topranked individual, which we know as a groundtruth label from running the simulation.
3.1.1 Dictatorship Model
In the Dictatorship Model (or ‘DM’), we fix a single leader who initiates movement from initial positions of the population, randomly sampled from a circle geometry. At the start of the precoordination interval, the leader moves in a fixed direction and velocity. Every other individual samples a uniformly random lag. After waiting this time, the individual follows the leader at a fixed velocity (with sampled Gaussian noise in the heading). After a fixed duration of coordinated movement over the entire population, postcoordination begins and individuals decrease velocity at random, until stopping. To produce multiple coordination events, this procedure is repeated after a sufficient waiting time, starting from the stopping positions of the previous coordination event.
3.1.2 Hierarchical Model
The Hierarchical Model (or ‘HM’) is a variation of the Dictatorship Model, where we fix a sequence of individuals (n=4) to follow the previous individual in the sequence, after a fixed lag. The remainder of individuals in the population follow exactly one of these highranking individuals, allocated in decreasing proportion per rank.
3.1.3 Linear Threshold Model
The Linear Threshold Model (or ‘LT’) [19] adapts the Dictatorship Model by initiating individual movement as a linear threshold over a network of the nearest neighbors at the current timestep. There is still one individual who attempts to initiate movement. After the initial step, the model is parameterized by , the number of nearest neighbors to query, and the proportion of these
neighbors required to initiate movement (e.g. “become infected”). Once “infected” the individual follows the leader as in the previous models. The initial probability of moving for each individual is
. We explore the parameter space on combinations of: and . For convenient notation, we refer to as the Linear Threshold Model under these parameters, e.g. .3.1.4 Random model
In the random model, there is no ‘following’ relationship. At the start of the precoordination interval, each individual starts moving to a fixed position. Velocity and heading error are sampled from fixed distributions, therefore, the initial positions yield some spurious following relationships. However, the density in the following network is generally insufficient for flagging a coordination event. When coordination events do occur, the PageRank values and PageRank support is typically low.
3.2 Synthetic trajectory simulation
For each of the above models, we generate a trial of synthetic data consisting of individuals and 12,000 total timesteps, with separate coordination events. Each coordination event has precoordination and coordination intervals of timesteps each. Following the coordination interval is another time steps of a postcoordination gap, before the start of the next coordination event. We generate trials for each of the above models.
3.3 Real datasets
We demonstrate the utility of our framework on three realworld datasets from two different domains. First, we look at biological trajectory datasets derived from GPS, and cameras. Next we look at fifteen years of stock closingprice data from the NASDAQ index.
3.3.1 Baboon trajectories
In this dataset, highresolution GPS collars track 26 individuals of a troop of olive baboons (Papio anubis) living in the wild in Mpala Research Centre, Kenya [8, 34]. The data consists of latitudelongitude location pairs for each individual at one observation per second. We analyze a subset of individuals whose collars remained functional for a day period (419,095 time steps). The task is to automatically detect periods of coordinated group movement and to identify the initiator(s) of these periods, as well as to classify the type of leadership mechanism employed.
3.3.2 Fish schools trajectories
In this dataset, the movements of a fish school of golden shiners (Notemigonus crysoleucas) are recorded by video in order to study information propagation over the visual fields of fish [35]. Within schools of fish, there were trained individuals who were able to lead the school to feeding sites over separate leadership events. Individual trajectories were identified based on automated tracking from video images, and trained fish were identifiable in the videos based on colored tags. Here, periods of coordinated activity had already been identified by experimenters, so the task is to correctly identify trained fish by leadership ranking.
Each population contains fish, with trained, labeled fish who are able to lead the school to feeding sites over separate leadership events. The task is to correctly identify trained fish by leadership ranking.
3.3.3 Stock closingprice time series
We collected daily closing price data for stocks listed in NASDAQ, using Yahoo! Finance.^{4}^{4}4http://finance.yahoo.com/ These time series are from January 2000 to Jan 2016 (4169 timesteps). We remove symbols with a large amount of missing data, leaving a total of symbols in our dataset. Our analysis focuses on discovering different intervals of coordination, and the leaders and sectors involved in these coordinated events.
4 Results
4.1 Identifying leaders
In each simulation, we have the label of the true leader (and select the first individual as ‘leader’ in the random model). For each of the 100 simulation trials, our method identifies the ‘leader’ as the individual with maximum support, according to Equation 6. This is the individual ranked first most frequently over the coordination events. For each model, we set threshold at the mean of the density . We report the ‘precision’ as the fraction of correctly identified leaders.
Table 2 reports precision over all synthetic model simulations. PageRank performs best at identifying the leader of the group across all (nonrandom) simulation models. This follows our intuition that there is a higher order structure to leadership where directed paths through the network are meaningful.
Table 2 shows that Velocity Convex Hull (VCH) correctly identifies leadership in Dictatorship Model (DM) and Hierarchical Model (HM) simulations. This is because the leader moves first by the design of each model, which is sufficient for a consistent first ranking. In the Linear Threshold Model (LT), the leader only starts moving in the precoordination interval after ‘infection’, therefore velocity is more uniform across the population and, thus, poor at ranking. Position Convex Hull (PCH) performs fairly well for recovering the leader. In all models, the leader emerges at the front of the group after some time. How the group is organized prior to emerging determines the ‘noise’ in this ranking.
Simulation  PageRank  VCH  PCH 

DM  1  1  0.84 
HM  1  1  0.26 
1  0  1  
1  0  0.99  
0.93  0.01  1  
1  0  0.97  
0.97  0.03  0.95  
0.89  0.03  0.79  
0.78  0.04  0.61  
0.86  0.04  0.8  
0.61  0  0.48  
Random  0.02  0.04  0 
We also test our framework’s ability to distinguish heterogeneous leadership. Figure 6 illustrates a slight variation of the Dictatorship Model (DM), where the individual is the leader for interval when (e.g. along the diagonal). We correctly identify leaders of all intervals using PageRank rank ordering. On an aggregated static network, all individuals are indistinguishable by PageRank.
4.2 Identifying leadership hierarchy
We now infer the top leaders according to support (Equation 6). Recall that in our proposed Hierarchical Model (HM), we have fixed identities of ranks 1 to 4, which we compare to our top4 support individuals according to different leadership rankings. We report the same precision as in Table 2 for each individual rank. Table 3 shows PageRank performs well even deeper in the ranking. Recall that ‘followers’ in the remainder of the population are allocated to high ranked individuals proportional to rank. This causes spurious directed edges ‘up’ the hierarchy, between both leaders and followers. This demonstrates the aggregate ranking is robust to these noisy edges. We also observe that while the Velocity Convex Hull (VCH) and Position Convex Hull (PCH) had some performance for recovering the top ranked individual, they completely fail for lower rankings.
Rank  PageRank  VCH  PCH 

1  1  1  0.26 
2  0.93  0  0 
3  0.91  0  0 
4  0.46  0  0 
4.3 Case Study: trained leaders in fish schools
We identify the top leaders by support, on the fish school trajectory dataset (see: Section 3.3.2), where we have the labels of ‘trained’ individuals who should lead the school to feeding sites. Table 4 reports precision over trials. Similar to the simulation models, PageRank performs best overall, again suggesting that following is better captured in network representation than individual trajectory features.
Ranking  PageRank  VCH  PCH 

Top ranked support  0.79  0.67  0.67 
Top4 ranked support  0.70  0.57  0.47 
4.4 Case study: finding leaders of stock market events
We apply our leadership framework to stock market closing price data of the NASDAQ index (see: Section 3.3.3). A ‘leader’ in this context measures the extent that a stock increases or decreases in value before a large group of other stocks (e.g. a coordinated group). We apply the framework without any special consideration to the domain, only to validate that we can discover known events.
Figure 7 shows the network density of the inferred ‘following’ network over time, where we discover coordination events with threshold at the 75th percentile of density. Precoordination and coordination intervals are shown in red and green, respectively. We find significant economic events such as the 2000 tech collapse, and 9/11. More interestingly, we discover significant events which are captured in the network density signal but not necessarily the NASDAQ index. For example, we discover a technical econometric event where the TED Spread (a surrogate of national credit risk) begins fluctuating in July 2007, and a small market failure in August 2011. For the discovered coordination event of the 2000 collapse, the topranked companies are primarily in IT and semiconductors–matching our intuition–including large companies such as ARM Holdings, eBay, and SanDisk in the top 10.
4.5 Leadership model classification
Recall that we proposed several leadership rankings (Section 2.4) and presented Kendall rank correlation to compare them (Section 2.5.2). We do model classification on a simulation trial using all proposed features derived from the rank correlations: , , and
and the PageRank maximum support. A classifier takes those features and produces a leadership model label. We use 10fold cross validation on Random Forests
[16], over 1200 total trials over all models. Table 5reports the classification results over each simulation model. For the Linear Threshold model (LT), we combine different parameter settings under the same label. We see that all the models have high Fscore.
Figure 8 visualizes subspaces in the full featurespace. Figure 8 (TopLeft) shows (the rank correlation between global and local velocity convex hull (VCH) ranking) against the maximum support over all individuals for this trial. This figure supports our observations in Table 2 that the Linear Threshold model has low rank correlation (consistency) while the Hierarchical (HM) and Dictatorship Model (DM) have high maximum support and correlation (HM has higher rank correlation between the two because more ranks are explicitly fixed).
A key aspect of our simulation modeling is that we can characterize real datasets according to how they map into these featurespaces, compared to synthetic models. We compute each rank correlations over highconfidence baboon events, labeled "Baboon (High Confidence)" in Figure 8 , thresholded at the th percentile of density. We observe that within different subspaces, the baboon ranking is similar to Random or Linear Threshold, and has low maximum support for globallocal rank correlation features. We also plot the baboon rank correlation for the groundtruthed running example, (as presented in Section 2.1). For our key example, labeled “Baboon (Example)”, we see it has high rank correlation between both crossfeature axes. This suggests that in aggregate, baboon leadership is heterogenous and contextdriven (similar to the simulated case shown in Figure 6). This analysis provides a strategy for hypothesis testing and generation on contrasting timescales and subspaces.
Model  Precision  Recall  F score 

DM  1  1  1 
HM  0.95  1  0.97 
LT  0.99  0.99  0.99 
Random  0.94  0.91  0.92 
5 Conclusions
The work presented in this paper proposes a concrete, simple yet powerful, general framework for (1) identifying periods of coordinated group behavior, (2) identifying leaders of these events, and (3) classifying the type of leadership process at play. We validate the accuracy of our framework in performing all three of these tasks using simulated data. We further show that the framework can provide insights on realworld data, including data on collective animal movement and the economy. The methodology presented here is highly general and is likely to be applicable to a wide variety of domains where coordination across many agents is observed. In addition, our framework is highly flexible, and can easily be extended to incorporate other models of leadership or other features used in model classification, depending on the details of the system being analyzed.
References
 [1] M. Andersson, J. Gudmundsson, P. Laube, and T. Wolle. Reporting leaders and followers among trajectories of moving point objects. GeoInformatica, 12(4):497–528, 2008.
 [2] E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts. Everyone’s an Influencer: Quantifying Influence on Twitter. WSDM’11, 65–74. ACM, 2011.
 [3] R. Bonanni, S. Cafazzo, P. Valsecchi, and E. Natoli. Effect of affiliative and agonistic relationships on leadership behaviour in freeranging dogs. Anim Behav, 79(5):981–991, 2010.
 [4] E. E. Boydston, T. L. Morelli, and K. E. Holekamp. Sex differences in territorial behavior exhibited by the spotted hyena (Hyaenidae, Crocuta crocuta). Ethology, 107(5):369–385, 2001.
 [5] L. J. Brent, D. W. Franks, E. A. Foster, K. C. Balcomb, M. A. Cant, and D. P. Croft. Ecological knowledge, leadership, and the evolution of menopause in killer whales. Current Biol., 25(6):746 – 750, 2015.
 [6] S. Brin and L. Page. The anatomy of a largescale hypertextual web search engine. WWW’98, 107–117, 1998.
 [7] I. D. Couzin, J. Krause, N. R. Franks, and S. A. Levin. Effective leadership and decisionmaking in animal groups on the move. Nature, 433(7025):513–516, 2005.
 [8] M. C. Crofoot, R. W. Kays, and M. Wikelski. Data from: Shared decisionmaking drives collective movement in wild baboons. 2015.
 [9] J. R. G. Dyer, A. Johansson, D. Helbing, I. D. Couzin, and J. Krause. Leadership, consensus decision making and collective behaviour in humans. Philos Trans R Soc Lond B Biol Sci, 364(1518):781–789, 2009.
 [10] I. C. Gilby, Z. P. Machanda, D. C. Mjungu, J. Rosen, M. N. Muller, A. E. Pusey, and R. W. Wrangham. ’impact hunters’ catalyse cooperative hunting in two wild chimpanzee communities. Philos Trans R Soc Lond B Biol Sci, 370(1683), 2015.
 [11] L. Glowacki and C. von Rueden. Leadership solves collective action problems in smallscale societies. Philos Trans R Soc Lond B Biol Sci, 370(1683), 2015.
 [12] A. Goyal, F. Bonchi, and L. V. Lakshmanan. Discovering leaders from community actions. CIKM’08, 499–508, 2008.
 [13] A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning influence probabilities in social networks. WSDM’10, 241–250. ACM, 2010.
 [14] X. He and D. Kempe. Stability of Influence Maximization. ArXiv eprints, 2015.
 [15] R. Heinsohn and C. Packer. Complex cooperative strategies in groupterritorial african lions. Science, 269(5228):1260–1262, 1995.
 [16] T. K. Ho. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell., 20(8):832–844, 1998.
 [17] P. L. Hooper, H. S. Kaplan, and J. L. Boone. A theory of leadership in human cooperative groups. J Theor Biol, 265(4):633–646, 2010.
 [18] A. Java, P. Kolari, T. Finin, and T. Oates. Modeling the spread of influence on the blogosphere. WWW’06, 22–26, 2006.
 [19] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. KDD’03, 137–146, 2003.
 [20] M. B. Kjargaard, H. Blunck, M. Wustenberg, K. Gronbask, M. Wirz, D. Roggen, and G. Troster. Timelag method for detecting following and leadership behavior of pedestrians from mobile sensing data. Proc. IEEE PerCom, 56–64, 2013.
 [21] Z. Li, F. Wu, and M. C. Crofoot. Mining Following Relationships in Movement Data. ICDM’13, 458–467, 2013.
 [22] D. Lusseau and L. Conradt. The emergence of unshared consensus decisions in bottlenose dolphins. Behav Ecol Sociobiol, 63(7):1067–1077, 2009.
 [23] R. Mares, A. J. Young, and T. H. CluttonBrock. Individual contributions to territory defence in a cooperative breeder: weighing up the benefits and costs. Philos Trans R Soc Lond B Biol Sci, 279(1744):3989–3995, 2012.
 [24] R. O’Gorman, J. Henrich, and M. Van Vugt. Constraining free riding in public goods games: designated solitary punishers can sustain human cooperation. Philos Trans R Soc Lond B Biol Sci, 276(1655):323–329, 2009.
 [25] D. A. M. P. Weinstein. Leadership behaviour in sawfly larvae Perga dorsalis (Hymenoptera: Pergidae). Oikos, 79(3):450–455, 1997.
 [26] H. Pham and C. Shahabi. Spatial Influence  Measuring Followship in the Real World. ICDE’16, 2016.
 [27] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Sig. Process., 26(1):43–49, 1978.
 [28] M. ShokoohiYekta, J. Wang, and E. Keogh. On the NonTrivial Generalization of Dynamic Time Warping to the MultiDimensional Case. SDM’15, 289–297.
 [29] J. E. Smith, J. R. Estrada, H. R. Richards, S. E. Dawes, K. Mitsos, and K. E. Holekamp. Collective movements, leadership and consensus costs at reunions in spotted hyaenas. Anim Behav, 105:187–200, 2015.
 [30] J. E. Smith, S. Gavrilets, M. B. Mulder, P. L. Hooper, C. E. Mouden, D. Nettle, C. Hauert, K. Hill, S. Perry, A. E. Pusey, M. van Vugt, and E. A. Smith. Leadership in mammalian societies: Emergence, distribution, power, and payoff. Trends Ecol Evol, 31(1):54–66, 2016.
 [31] F. Solera, S. Calderara, and R. Cucchiara. Learning to identify leaders in crowd. CVPR’15 Workshops, 43–48, 2015.
 [32] J. C. Stewart and J. P. Scott. Lack of correlation between leadership and dominance relationships in a herd of goats. J Comp Physiol Psych, 40(4):255–264, 1947.
 [33] A. StrandburgPeshkin, D. R. Farine, I. D. Couzin, and M. C. Crofoot. Shared decisionmaking drives collective movement in wild baboons. Science, 348(6241):1358–1361, 2015.
 [34] A. StrandburgPeshkin, D. R. Farine, I. D. Couzin, and M. C. Crofoot. Shared decisionmaking drives collective movement in wild baboons. Science, 348(6241):1358–1361, 2015.
 [35] A. StrandburgPeshkin, C. R. Twomey, N. W. F. Bode, A. B. Kao, Y. Katz, C. C. Ioannou, S. B. Rosenthal, C. J. Torney, H. S. Wu, S. A. Levin, and I. D. Couzin. Visual sensory networks and effective information transfer in animal groups. Current Biology, 23(17):R709–R711, 2016.
 [36] R. Sulo, T. BergerWolf, and R. Grossman. Meaningful selection of temporal resolution for dynamic networks. MLG’10, 127–136. ACM, 2010.
 [37] J. Sun and J. Tang. A Survey of Models and Algorithms for Social Influence Analysis, 177–214. 2011.
 [38] C. Tan, J. Tang, J. Sun, Q. Lin, and F. Wang. Social Action Tracking via Noise Tolerant Timevarying Factor Graphs. KDD’10, 1049–1058, 2010.
 [39] M. Van Vugt. Evolutionary origins of leadership and followership. Pers Soc Psychol Rev, 10(4):354–371, 2006.
Comments
There are no comments yet.