FLICA: A Framework for Leader Identification in Coordinated Activity

03/04/2016 ∙ by Chainarong Amornbunchornvej, et al. ∙ University of California-Davis Max Planck Society Princeton University University of Illinois at Chicago 0

Leadership is an important aspect of social organization that affects the processes of group formation, coordination, and decision-making in human societies, as well as in the social system of many other animal species. The ability to identify leaders based on their behavior and the subsequent reactions of others opens opportunities to explore how group decisions are made. Understanding who exerts influence provides key insights into the structure of social organizations. In this paper, we propose a simple yet powerful leadership inference framework extracting group coordination periods and determining leadership based on the activity of individuals within a group. We are able to not only identify a leader or leaders but also classify the type of leadership model that is consistent with observed patterns of group decision-making. The framework performs well in differentiating a variety of leadership models (e.g. dictatorship, linear hierarchy, or local influence). We propose five simple features that can be used to categorize characteristics of each leadership model, and thus make model classification possible. The proposed approach automatically (1) identifies periods of coordinated group activity, (2) determines the identities of leaders, and (3) classifies the likely mechanism by which the group coordination occurred. We demonstrate our framework on both simulated and real-world data: GPS tracks of a baboon troop and video-tracking of fish schools, as well as stock market closing price data of the NASDAQ index. The results of our leadership model are consistent with ground-truthed biological data and the framework finds many known events in financial data which are not otherwise reflected in the aggregate NASDAQ index. Our approach is easily generalizable to any coordinated activity data from interacting entities.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Leadership is an important aspect of the social organization, formation, and decision-making of groups of people in online and offline communities, as well as other social animals. Understanding the dynamics of emerging leadership allows researchers to gain insights into how social species make decisions. Until recently, it has been difficult if not impossible to pinpoint the identity of a leader from available observational data without explicit additional information. However, the availability of data from physical proximity sensors, GPS, and the web opens up the possibility of measuring leadership in online activities, face-to-face interactions, animal populations, and aggregate social processes such as economic activity. This paper presents an automated method for unsupervised identification of leader identity in the context of successful initiation of coordinated activities among groups of individuals. The method uses only the data on the time series of individual activities, with no additional information. The proposed approach automatically determines (1) when a group decision was made, (2) the identity of the leader, and (3) the mechanism by which the group agreed to follow the leader.

Figure 1: A high-level overview of the proposed framework

Previous work over several domains defines leadership according to physical movement, in public spaces [31], location based social networks [26], physical association patterns [22], and other physical trajectories [1]. Leadership has also been studied in online social networks [12], where user actions are imitated over the network topology. Much of this work has focused on identifying leaders from dyadic interactions, but little work has focused on measuring leadership in coordinated group activity, which occurs in group decision-making and collaborative systems. Under this view of leadership, a leader is simply the individual who successfully initiates the coordinated activity of a group, followed by other individuals. Moreover, most of the previous work does not explicitly focus on the time when the decision is made and leadership is manifested, (i.e. the period of a group’s transition to the coordinated activity). Finally, many of the previous approaches assume a particular model of leadership, such as influence maximization, whereas here we present a framework to differentiate between alternate models of leadership.

1.1 Our Contributions

We propose a general, scientifically grounded, unsupervised, modular, and extendable framework with few assumptions for identifying individuals who lead a group to a state of coordinated activity. Our framework is capable of:

  • Detecting events of coordinated activity: discovering time intervals of coordination in group activity data and the transition periods of decision-making which lead to that coordination;

  • Identifying leaders: identifying the initiators of this coordinated behavior, the individuals who succeeded in leading the group to coordination; and

  • Classifying the group leadership model: characterizing the type of the group’s transition behavior to coordination according to interpretable, dynamic models (e.g. hierarchical, dictatorial).

We demonstrate the framework’s ability to analyze leadership in coordinated activity on synthetic and real datasets over several domains. We use synthetic simulated data to validate every aspect of the framework. We use two biological datasets – GPS tracks of a baboon troop and video-tracking of fish schools, – as well as stock market closing price data of the NASDAQ index. The results are consistent with ground-truthed biological data and the framework finds many known events in financial data which are not otherwise reflected in the aggregate NASDAQ index. Our approach is easily generalizable to any coordinated activity data from interacting entities.

1.2 Related Work

Coordinating patterns of individual activity is a challenge that all social organisms face, and diverse strategies–from democratic to dictatorial–have emerged to allow members of groups to reach consensus. Leadership (defined as non-random, differential influence [30]) plays a key role in organizing the collective (i.e. group) behaviors of social organisms ranging from humans  [9] to hyaena [29] to hymenoptera [25]. It potentiates complex patterns of cooperation and conflict (e.g., lions [15], hyaenas [4], meerkat [23], chimpanzees [10], humans [11]), organizes group movements (fish [7], humans [9], dogs [3]), and may prevent free-riding [24, 17].

Substantial interest currently exists in identifying individuals who act as leaders and determining how they influence the behavior of others in their social network. Most previous computational work creates global, static leadership ranking over the entirety of the input data [12, 38, 31, 18, 2]. This assumes that leadership relationships are global and fixed across time. However, the important initiators of group activity are not necessarily the individuals found at the top of their group’s social dominance hierarchy [32, 39, 5, 33]. Our framework explicitly identifies heterogenous, dynamic leadership ranking by identifying local time intervals of leadership and measuring rank-stability over time.

While domain-driven leadership models typically measure pairwise dyadic dominance or following interactions in the absence of an explicit network structure [1, 26, 21]

, most of the work in machine learning is on explicit, known network topologies

[37]. Our framework generalizes to either an explicit network or hidden implicit dynamic network topologies inferred over multidimensional time series data. Furthermore, reporting only dyadic leadership relationships does not incorporate conditional dependencies over the group (e.g. interactions are assumed as independent) [20]. To mitigate this, previous work on leadership incorporates high-level network measures [18] including PageRank and HITS, or cascade-size [2]. Our framework can use any ranking function, such as PageRank, and can be extended to any high-level measure on our inferred ‘following’ network.

From a social network perspective, leaders can be characterized as influential individuals who have many followers that imitate the leader’s actions [12], and thus are able to successfully take a group from one behavioral state to another. Significant attention has been paid to the problem of influence maximization (IM)–i.e. how individuals in a specific community are able to maximize their impact on the behavior of the community as a whole [19, 13]. Recently, these IM models, as well as more general domain-driven definitions of leadership have been shown to have considerable instability in the presence of noise often found in real-world datasets [14]. Our framework is general, allowing to test multiple leadership models and requires no parametric assumptions for the definition of following.

The ability to identify leaders based on their behavior and the subsequent reactions of others opens opportunities to explore how an individual affects group behavior and how group decisions are made. However, because leadership can occur in a multitude of contexts and take diverse forms, any generalized framework for identifying leaders or testing models about the underlying decision-making process that leads to group consensus must allow for domain-specific behavioral features. Further, a framework for testing among different models of leadership and consensus-building would pave the way for a more generalized understanding of collective behaviors and how they shape disparate social systems. Unfortunately, no such framework for leadership model classification exists. Here, we tackle this problem for the first time, focusing on developing de novo methods for both leadership identification and leadership model classification.

Symbol Definition
Mean of the signed index difference of a optimal warping path :
Density of graph G,
a coordination event, the set of
all coordination events identified by the framework,
,
a rank-order of some measure on .
the ‘global’ rank order over all .
Kendall rank correlation comparison over ,
The support of individual , , (fraction of first-ranked over intervals)
Model Parameters
Time-series window size , overlapping window shift size , Dynamic Time Warping () warping band , and density-coordination threshold
Table 1: Table of symbol definitions used throughout this paper.

2 Methods

Our proposed framework measures ‘following’ relationships in multidimensional time series of an arbitrary cardinality, and constructs a network model to rank the leaders before and at the time of coordination. Table 1 summarizes all symbols and parameters introduced below.111

Matrices, vectors, and sets are denoted by capital letters, individual scalars are denoted with lowercase,

parameters are denoted with Greek letters.

Figure 1 gives a high-level overview of our proposed framework. The framework takes as input a collection of multidimensional time series data. The framework first (a) computes time-series measures appropriate for ‘following’ relationships (e.g. Dynamic Time Warping, see Section 2.2.1) over sliding windows, and (b) uses these associations to construct a sequence of directed ‘following’ graphs. We (c) model time intervals of coordination using the density over time of the graph sequence. For these intervals, we (d) apply our set of leadership ranking measures (e.g. PageRank) to capture different aspects of leadership. Finally, we output these ordered rankings.

2.1 A working example

(a) t=50
(b) t=100
(c) t=250
Figure 2: PageRank (top) and density (middle) of the ‘following’ network over time for an event of movement initiation by in baboons by individual ID3. (Bottom) The locations of individuals over three different time steps (), with the ‘following’ network, and PageRank indicated by node size.

Figure 2 presents a key example and a brief introduction to our framework, on real GPS trajectory data of olive baboons (Papio anubis). This event was validated with video taken on-site (see: Section 3.3.1). Figures 2(a)-2(c) show the GPS locations of baboons over three different time steps (). These figures also show the directed ‘following’ network, and the PageRank of individuals at that time step (by node size scaling).

The middle plot shows the density of the ‘following’ network over the duration of the entire event. The dotted red lines denote different intervals of this event, based on network density over time.The increase in density corresponds to the transition from uncoordinated to coordinated movement and the interval of high density between the dotted lines corresponds to coordinated group movement. The top figure presents the PageRank of individuals in the following network over a coordination event.

Figures 2(a)-2(b) show the initiation of movement of the group by ID3 (Black). Figure 2(c) shows the ‘following’ network now in the coordination interval. Individual ID3 has the largest weight in the first two snapshots, and the PageRank of individual ID1 (Blue) surpasses ID3 only after the network is ‘coordinated’ (e.g. moving together). If we measure the leadership ranking after network density is high, we miss that ID3 ‘built’ the network in the pre-coordination interval (to the left of the first dotted red line).

2.2 Time series analysis

A multidimensional time series is a tuple of cardinality of time-ordered sequences of observations of length :

(1)

Our input dataset contains time series of fixed cardinality (any ). Each represents the activity of an entity (e.g. user, individual). The total size of is then given by . In the case of typical geospatial trajectories, , for latitude and longitude.

2.2.1 Time series measures and sliding windows

Our framework constructs a directed association network by measuring following interactions between time series. The definition of the ‘following’ relationship is the atomic unit which determines our network topology, and the subsequent leadership analysis.

We focus on Dynamic Time Warping () [27]. However, any appropriate local measure of time-lagged similarity may be used. Dynamic Time Warping is an optimal elastic matching between sequences using dynamic programming and is regarded as “remarkably hard to beat as a time series distance measure, across a host of domain applications, and a host of tasks; including clustering, classification and similarity search” [28]. Figure 3 (Left) shows two trajectories, where time-shifting ahead in time produces a better match to , illustrated in the warping path in Figure 3 (Right). For the multidimensional generalization of , we use (with standing for ‘dependent’ [28]). To compute the distance between multidimensional observations at cell in the dynamic programming matrix, simply uses the Euclidean distance over the dimensions:222 We use ‘*’ subscript notation in matrices to indicate slicing in the dimension(s).

(2)

In practice, uses a warping band which constrains the difference in time between matched observations such that . Aside from reducing computation from to , this enforces domain knowledge of what constitutes a ‘coherent’ match for our ‘following’ relationships of interest. We discuss parameter selection in Section 2.6.2.

2.2.2 Associations in time series sliding windows

Figure 3: (Left) Toy time series showing following . (Right) the optimal warping path on the dynamic programming matrix, shifting forward in time onto .

Dynamic time warping is typically applied as a global matching measure. However, we make a Markovian assumption to identify ‘following’ on time series subsequences, which in aggregate may not correspond to the global optimal warping solution (see Section: 2.6.1 for discussion).

For a pair of time series, a time series window size and a sliding window step-size (see Section 2.6.2), we calculate on the time series subsequences defined on these overlapping windows. The -th window is an interval given by: . then outputs an optimal warping path on the time series subsequence pair (, ). We denote this path by , represented as a sequence of index pairs , see: Figure 3 (Right). We compute the mean of the signed index difference on this index pairs sequence:

(3)

This function measures the extent of warping between two time series. For time series which cannot be warped one-onto-the-other, . When is positive, follows , as shown in Figure 3 and when negative, follows . This function is bounded by [-1, 1].

2.3 Dynamic association network: inference and analysis

We construct a time-varying, directed association network on the multidimensional time series dataset for sliding window steps . Let be a network with node-set of size , and edge-set of size . The nodes are the entities represented by each time series . For pairs of time series, we construct a sequence of edge-sets using Equation 3 on valid windows bounded by time series length . Non-zero define directed edges at time , between nodes associated with and . Although here we compute the pairwise association network between all time series pairs, our framework trivially generalizes to the case where an explicit network is given, where we measure ‘following’ relationships only for the time series pairs associated with adjacent nodes in that network.

2.3.1 Detecting intervals of coordination

Figure 4: A coordination event is a pair of intervals. We define the pre-coordination interval and coordination interval using threshold on the network density time series.

Recall that our framework identifies time intervals of coordinated activity and measures the leader who initiated that coordination. Once the group’s actions are already coordinated, we are ‘too late’ to observe the initiation. Our dynamic ‘following’ network captures pairwise following activity. Therefore, network density, is a simple measure of group coordination with the fewest assumptions on the structure of following.

Figure 4 illustrates the definition of a coordination event as a pair of time intervals. We apply a threshold to the network density time series , over varying . We set adaptively, according to the distribution of by taking as the threshold value either the mean, median, or another percentile of the density distribution. A contiguous time interval above the threshold defines a coordination interval, and the preceding interval below is a pre-coordination interval. For the pre-coordination interval, the trend of network density is assumed to be monotonically nondecreasing. Therefore, we determine the beginning of the pre-coordination interval as the first time-step prior to the coordination interval where the discreet derivative (e.g. difference) is zero: .

Together, these intervals are one coordination event, represented by the 3-tuple of time indices (sharing index ). The collection of coordination events is a set . The total interval of the event, are non-overlapping in , and denotes the total number of 3-tuples. For the remainder of our framework, we measure leadership only on these events in set . To reduce the number of intervals which might be generated near the threshold , we apply a greedy merging of nearby coordination intervals (taking the range from the warping band window ).

2.4 Leadership ranking

In this section, we propose several methods for measuring different aspects of leadership by comparing higher-order network features against individual time series features. These feature spaces give an extendable way to compare different aspects of leadership in the absence of a single, unambiguous leadership definition.

In all of the below analysis, we compare rank ordering, denoted , as the sorted position from best to worst of nodes on some measure function, i.e. .333 We use ‘’ to denote placeholders, e.g. for parameters.

2.4.1 PageRank

PageRank [6]

is a standard method for approximating eigenvector centrality in networks, designed to measure the importance of a node by the number and the importance of the other nodes linking to it. In a network where a link represents a following relationship between nodes, PageRank measures how many other nodes follow a given node and how many followers do those nodes have, etc. Thus, it fits well with our definition of a leader.

PageRank returns a weight vector of length , with a sum of 1. We calculate PageRank for each static graph within the dynamic graph sequence of the pre-coordination interval, and let be a sequence of -length PageRank vectors.

While the definition of a time step in the original time series domain and the ‘following’ network may be different, the pre-coordination interval is precisely defined in both domains. We aggregate the rankings over the entire pre-coordination interval and produce one rank value per individual for the entire interval.

2.4.2 Velocity Convex Hull

The velocity convex hull measures the frequency with which the discrete time series derivative () associated with a node is outside the bounds of the population’s discrete derivative distribution (including node ) in the previous time step. In aggregate, a high rank of this measure indicates which node first moves in the group. In the case of spatiotemporal trajectories, this measure corresponds to how often an individual’s velocity at a given time step is outside the range of velocities that were present in the group at the previous time step.

The convex hull can be computed on arbitrary dimensions of a multidimensional time series, or their derivatives, jointly or independently. The convex hull function returns an -dimensional surface represented as lines between points in the input data, which encompass all other points. Because we look at velocity jointly in the one dimensional case, we can directly use the max and min.

Let be a -sized matrix measuring individual velocity over time, on time series dataset . For an individual at time-step , we define the following indicator function:

(4)

For time step we output an -length rank order vector as .

2.4.3 Position Convex Hull

The position convex hull is analogous to velocity, except that our indicator function measures an individual’s position relative to the convex hull containing the population at the previous time step. Rather than look at velocity of initiation, this measure captures an individual’s frequency of moving outside the geometric boundaries of the group, and close to the average heading of the group (e.g. in ‘front’ of the group).

We compute the convex hull function on time-step , = , and also introduce the heading vector of individual : , and the population heading vector: = . We define the function to denote standard ‘B contains A’ spatial queries between two geometry objects, and to measure the angle between two vectors and .

Using these definitions, we define the position convex hull indicator function for individual at time :

(5)

For time step we output an -length rank order vector as .

2.5 Leadership comparison

In Section 2.3.1 we described how we detect coordination events. We now describe how we apply the measures described in Section 2.4 on the collection of detected coordination events to determine leader identity.

2.5.1 Leadership support

Recall that denotes the coordination events discovered by our framework, and is the total number of events. We aggregate rankings across a single coordination event . For node we calculate the mean rank over all time-steps in ’s pre-coordination interval

. We then rank nodes by this value. Note that by design, the mean rank is affected proportional to the distance from the mean (e.g. outlier values), unlike the mode (e.g. the node’s most likely ranking). In practice, ranking has considerable noise in the local ordering, so ranking by expectation is more robust.

We also define a ‘global’ rank order by this same procedure, combining all rank-order vectors over all coordination events in , and computing the mean rank-ordering. Let this ‘global’ rank ordering be defined as for PageRank, for velocity convex hull, and for position convex hull.

For all , and node , Leadership Support is defined relative to a particular measure (e.g. PageRank) for as the fraction of intervals where is first ranked:

(6)

To measure the global leader in our framework, we use PageRank in the definition of maximum support over all : .

2.5.2 Comparing rank-orders

We use the Kendall rank correlation coefficient to compare local and global rank-orders. This measure provides a similarity between two rankings according to their ordinal agreement over all list-pairs (e.g. is “below” in both lists).

To compare global and local rank orders, we use the mean Kendall rank correlation over all coordination events against the global:

(7)

For example, compares local and global velocity convex hull rank orders.

Similarly, we compute the mean Kendall correlation between local rankings associated with different measures (e.g. velocity convex hull, position convex hull):

(8)

Equation 7 formalizes our intuition that leaders consistently move outside of the spatial extent (), or the distribution of velocity over the population (). By comparing the global vs. local correlation in rank ordering, we measure the stability of the global ranking is over time.

Equation 8 measures the relationship between higher-order graph structure (centrality) and simple time series features. Using this measure, we can gain a better understand the high-level aspects of initiating coordination. For example, we see whether changing velocity (), or position () within the group is correlated with network rank position.

2.6 Framework discussion

2.6.1 Local vs. Global Matching

(a) Pattern order alignment
(b) Global shift alignment
Figure 5: Dynamic Time Warping global vs. local example

Our proposed framework uses local alignment on time series subsequences, rather than global alignment on the full time series. Figure 5 presents a motivation for this choice. Suppose we intend to match sparse ‘following’ events represented as the pair of spikes with relatively low magnitude at the end of the red and blue time series. In Figure 5(a), the time series is shifted to match one of the two patterns, depending on the cost. This forces a mismatch of the ‘following’ event. Similarly, Figure 5(b) has a low cost matching by shifting the entire time series at a constant rate. By matching only local subsequences, we can recover both of these ‘following’ events.

2.6.2 Parameter selection

Although the proposed framework has four parameters, the dynamic time warping band , and the overlapping window shift size are minor parameters that primarily trade off computation versus the sampling rate of the time series process. For a window size , the DTW sub-procedure is computed in , where yields the traditional quadratic runtime. Although the warping band reduces computation, it also enforces domain knowledge of an appropriate time-lag. The framework is multiplicative in the number of individual-pairs, , the number of time windows , and the cost of . Therefore, the total cost is , which can be reduced by increasing or decreasing .

The parameters defining the dynamic network time-scale and the threshold of coordination events are the major parameters of the framework. We tune

according to the TWIN heuristic

[36] on network density. TWIN is an information-theoretic heuristic which discovers natural time scales in dynamic networks data. For , we explore different percentiles on the density distribution.

3 Evaluation Datasets

We evaluate our framework on four synthetic trajectory models and three real datasets.

3.1 Simulation models

We develop synthetic trajectory models which capture several hypotheses of movement with respect to leadership. These models are not intended to accurately reproduce trajectories in our domain, nor are they exhaustive of different properties of interest. Comparing real data against model simulations provides interpretable characterization of these datasets. We propose four synthetic models: dictatorship, hierarchical, linear-threshold, and fixed-destination (random) models. For evaluation, we attempt to identify the top-ranked individual, which we know as a ground-truth label from running the simulation.

3.1.1 Dictatorship Model

In the Dictatorship Model (or ‘DM’), we fix a single leader who initiates movement from initial positions of the population, randomly sampled from a circle geometry. At the start of the pre-coordination interval, the leader moves in a fixed direction and velocity. Every other individual samples a uniformly random lag. After waiting this time, the individual follows the leader at a fixed velocity (with sampled Gaussian noise in the heading). After a fixed duration of coordinated movement over the entire population, post-coordination begins and individuals decrease velocity at random, until stopping. To produce multiple coordination events, this procedure is repeated after a sufficient waiting time, starting from the stopping positions of the previous coordination event.

3.1.2 Hierarchical Model

The Hierarchical Model (or ‘HM’) is a variation of the Dictatorship Model, where we fix a sequence of individuals (n=4) to follow the previous individual in the sequence, after a fixed lag. The remainder of individuals in the population follow exactly one of these high-ranking individuals, allocated in decreasing proportion per rank.

3.1.3 Linear Threshold Model

The Linear Threshold Model (or ‘LT’) [19] adapts the Dictatorship Model by initiating individual movement as a linear threshold over a network of the -nearest neighbors at the current time-step. There is still one individual who attempts to initiate movement. After the initial step, the model is parameterized by , the number of nearest neighbors to query, and the proportion of these

neighbors required to initiate movement (e.g. “become infected”). Once “infected” the individual follows the leader as in the previous models. The initial probability of moving for each individual is

. We explore the parameter space on combinations of: and . For convenient notation, we refer to as the Linear Threshold Model under these parameters, e.g. .

3.1.4 Random model

In the random model, there is no ‘following’ relationship. At the start of the pre-coordination interval, each individual starts moving to a fixed position. Velocity and heading error are sampled from fixed distributions, therefore, the initial positions yield some spurious following relationships. However, the density in the following network is generally insufficient for flagging a coordination event. When coordination events do occur, the PageRank values and PageRank support is typically low.

3.2 Synthetic trajectory simulation

For each of the above models, we generate a trial of synthetic data consisting of individuals and 12,000 total time-steps, with separate coordination events. Each coordination event has pre-coordination and coordination intervals of time-steps each. Following the coordination interval is another time steps of a post-coordination gap, before the start of the next coordination event. We generate trials for each of the above models.

3.3 Real datasets

We demonstrate the utility of our framework on three real-world datasets from two different domains. First, we look at biological trajectory datasets derived from GPS, and cameras. Next we look at fifteen years of stock closing-price data from the NASDAQ index.

3.3.1 Baboon trajectories

In this dataset, high-resolution GPS collars track 26 individuals of a troop of olive baboons (Papio anubis) living in the wild in Mpala Research Centre, Kenya [8, 34]. The data consists of latitude-longitude location pairs for each individual at one observation per second. We analyze a subset of individuals whose collars remained functional for a day period (419,095 time steps). The task is to automatically detect periods of coordinated group movement and to identify the initiator(s) of these periods, as well as to classify the type of leadership mechanism employed.

3.3.2 Fish schools trajectories

In this dataset, the movements of a fish school of golden shiners (Notemigonus crysoleucas) are recorded by video in order to study information propagation over the visual fields of fish [35]. Within schools of fish, there were trained individuals who were able to lead the school to feeding sites over separate leadership events. Individual trajectories were identified based on automated tracking from video images, and trained fish were identifiable in the videos based on colored tags. Here, periods of coordinated activity had already been identified by experimenters, so the task is to correctly identify trained fish by leadership ranking.

Each population contains fish, with trained, labeled fish who are able to lead the school to feeding sites over separate leadership events. The task is to correctly identify trained fish by leadership ranking.

3.3.3 Stock closing-price time series

We collected daily closing price data for stocks listed in NASDAQ, using Yahoo! Finance.444http://finance.yahoo.com/ These time series are from January 2000 to Jan 2016 (4169 time-steps). We remove symbols with a large amount of missing data, leaving a total of symbols in our dataset. Our analysis focuses on discovering different intervals of coordination, and the leaders and sectors involved in these coordinated events.

4 Results

4.1 Identifying leaders

In each simulation, we have the label of the true leader (and select the first individual as ‘leader’ in the random model). For each of the 100 simulation trials, our method identifies the ‘leader’ as the individual with maximum support, according to Equation 6. This is the individual ranked first most frequently over the coordination events. For each model, we set threshold at the mean of the density . We report the ‘precision’ as the fraction of correctly identified leaders.

Table 2 reports precision over all synthetic model simulations. PageRank performs best at identifying the leader of the group across all (non-random) simulation models. This follows our intuition that there is a higher order structure to leadership where directed paths through the network are meaningful.

Table 2 shows that Velocity Convex Hull (VCH) correctly identifies leadership in Dictatorship Model (DM) and Hierarchical Model (HM) simulations. This is because the leader moves first by the design of each model, which is sufficient for a consistent first ranking. In the Linear Threshold Model (LT), the leader only starts moving in the pre-coordination interval after ‘infection’, therefore velocity is more uniform across the population and, thus, poor at ranking. Position Convex Hull (PCH) performs fairly well for recovering the leader. In all models, the leader emerges at the front of the group after some time. How the group is organized prior to emerging determines the ‘noise’ in this ranking.

Simulation PageRank VCH PCH
DM 1 1 0.84
HM 1 1 0.26
1 0 1
1 0 0.99
0.93 0.01 1
1 0 0.97
0.97 0.03 0.95
0.89 0.03 0.79
0.78 0.04 0.61
0.86 0.04 0.8
0.61 0 0.48
Random 0.02 0.04 0
Table 2: Precision of leadership identification on simulation models
Figure 6: Synthetic experiment (on Dictatorship Model) with a changing leader, where the individual along the diagonal is the leader for event when . We correctly identify all local leaders by PageRank. In the static, aggregated network individuals are indistinguishable by PageRank.

We also test our framework’s ability to distinguish heterogeneous leadership. Figure 6 illustrates a slight variation of the Dictatorship Model (DM), where the individual is the leader for interval when (e.g. along the diagonal). We correctly identify leaders of all intervals using PageRank rank ordering. On an aggregated static network, all individuals are indistinguishable by PageRank.

4.2 Identifying leadership hierarchy

We now infer the top- leaders according to support (Equation 6). Recall that in our proposed Hierarchical Model (HM), we have fixed identities of ranks 1 to 4, which we compare to our top-4 support individuals according to different leadership rankings. We report the same precision as in Table 2 for each individual rank. Table 3 shows PageRank performs well even deeper in the ranking. Recall that ‘followers’ in the remainder of the population are allocated to high ranked individuals proportional to rank. This causes spurious directed edges ‘up’ the hierarchy, between both leaders and followers. This demonstrates the aggregate ranking is robust to these noisy edges. We also observe that while the Velocity Convex Hull (VCH) and Position Convex Hull (PCH) had some performance for recovering the top ranked individual, they completely fail for lower rankings.

Rank PageRank VCH PCH
1 1 1 0.26
2 0.93 0 0
3 0.91 0 0
4 0.46 0 0
Table 3: Precision on leadership hierarchy identification (on HM simulation)

4.3 Case Study: trained leaders in fish schools

We identify the top- leaders by support, on the fish school trajectory dataset (see: Section 3.3.2), where we have the labels of ‘trained’ individuals who should lead the school to feeding sites. Table 4 reports precision over trials. Similar to the simulation models, PageRank performs best overall, again suggesting that following is better captured in network representation than individual trajectory features.

Ranking PageRank VCH PCH
Top ranked support 0.79 0.67 0.67
Top-4 ranked support 0.70 0.57 0.47
Table 4: Leader identification in fish

4.4 Case study: finding leaders of stock market events

We apply our leadership framework to stock market closing price data of the NASDAQ index (see: Section 3.3.3). A ‘leader’ in this context measures the extent that a stock increases or decreases in value before a large group of other stocks (e.g. a coordinated group). We apply the framework without any special consideration to the domain, only to validate that we can discover known events.

Figure 7: (Top) NASDAQ ‘following’ network density and (Bottom) NASDAQ index value. The framework detects many known events in financial data (labeled above). Many of these events are not reflected in the aggregate NASDAQ index.

Figure 7 shows the network density of the inferred ‘following’ network over time, where we discover coordination events with threshold at the 75th percentile of density. Pre-coordination and coordination intervals are shown in red and green, respectively. We find significant economic events such as the 2000 tech collapse, and 9/11. More interestingly, we discover significant events which are captured in the network density signal but not necessarily the NASDAQ index. For example, we discover a technical econometric event where the TED Spread (a surrogate of national credit risk) begins fluctuating in July 2007, and a small market failure in August 2011. For the discovered coordination event of the 2000 collapse, the top-ranked companies are primarily in IT and semiconductors–matching our intuition–including large companies such as ARM Holdings, eBay, and SanDisk in the top 10.

4.5 Leadership model classification

Figure 8: Comparison of feature spaces of leadership model classifications on simulations and real data

Recall that we proposed several leadership rankings (Section 2.4) and presented Kendall rank correlation to compare them (Section 2.5.2). We do model classification on a simulation trial using all proposed features derived from the rank correlations: , , and

and the PageRank maximum support. A classifier takes those features and produces a leadership model label. We use 10-fold cross validation on Random Forests

[16], over 1200 total trials over all models. Table 5

reports the classification results over each simulation model. For the Linear Threshold model (LT), we combine different parameter settings under the same label. We see that all the models have high F-score.

Figure 8 visualizes sub-spaces in the full feature-space. Figure 8 (Top-Left) shows (the rank correlation between global and local velocity convex hull (VCH) ranking) against the maximum support over all individuals for this trial. This figure supports our observations in Table 2 that the Linear Threshold model has low rank correlation (consistency) while the Hierarchical (HM) and Dictatorship Model (DM) have high maximum support and correlation (HM has higher rank correlation between the two because more ranks are explicitly fixed).

A key aspect of our simulation modeling is that we can characterize real datasets according to how they map into these feature-spaces, compared to synthetic models. We compute each rank correlations over high-confidence baboon events, labeled "Baboon (High Confidence)" in Figure 8 , thresholded at the th percentile of density. We observe that within different sub-spaces, the baboon ranking is similar to Random or Linear Threshold, and has low maximum support for global-local rank correlation features. We also plot the baboon rank correlation for the ground-truthed running example, (as presented in Section 2.1). For our key example, labeled “Baboon (Example)”, we see it has high rank correlation between both cross-feature axes. This suggests that in aggregate, baboon leadership is heterogenous and context-driven (similar to the simulated case shown in Figure 6). This analysis provides a strategy for hypothesis testing and generation on contrasting time-scales and sub-spaces.

Model Precision Recall F score
DM 1 1 1
HM 0.95 1 0.97
LT 0.99 0.99 0.99
Random 0.94 0.91 0.92
Table 5: Random forest classification of synthetic leadership models using rank correlation features and PageRank maximum support

5 Conclusions

The work presented in this paper proposes a concrete, simple yet powerful, general framework for (1) identifying periods of coordinated group behavior, (2) identifying leaders of these events, and (3) classifying the type of leadership process at play. We validate the accuracy of our framework in performing all three of these tasks using simulated data. We further show that the framework can provide insights on real-world data, including data on collective animal movement and the economy. The methodology presented here is highly general and is likely to be applicable to a wide variety of domains where coordination across many agents is observed. In addition, our framework is highly flexible, and can easily be extended to incorporate other models of leadership or other features used in model classification, depending on the details of the system being analyzed.

References

  • [1] M. Andersson, J. Gudmundsson, P. Laube, and T. Wolle. Reporting leaders and followers among trajectories of moving point objects. GeoInformatica, 12(4):497–528, 2008.
  • [2] E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts. Everyone’s an Influencer: Quantifying Influence on Twitter. WSDM’11, 65–74. ACM, 2011.
  • [3] R. Bonanni, S. Cafazzo, P. Valsecchi, and E. Natoli. Effect of affiliative and agonistic relationships on leadership behaviour in free-ranging dogs. Anim Behav, 79(5):981–991, 2010.
  • [4] E. E. Boydston, T. L. Morelli, and K. E. Holekamp. Sex differences in territorial behavior exhibited by the spotted hyena (Hyaenidae, Crocuta crocuta). Ethology, 107(5):369–385, 2001.
  • [5] L. J. Brent, D. W. Franks, E. A. Foster, K. C. Balcomb, M. A. Cant, and D. P. Croft. Ecological knowledge, leadership, and the evolution of menopause in killer whales. Current Biol., 25(6):746 – 750, 2015.
  • [6] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. WWW’98, 107–117, 1998.
  • [7] I. D. Couzin, J. Krause, N. R. Franks, and S. A. Levin. Effective leadership and decision-making in animal groups on the move. Nature, 433(7025):513–516, 2005.
  • [8] M. C. Crofoot, R. W. Kays, and M. Wikelski. Data from: Shared decision-making drives collective movement in wild baboons. 2015.
  • [9] J. R. G. Dyer, A. Johansson, D. Helbing, I. D. Couzin, and J. Krause. Leadership, consensus decision making and collective behaviour in humans. Philos Trans R Soc Lond B Biol Sci, 364(1518):781–789, 2009.
  • [10] I. C. Gilby, Z. P. Machanda, D. C. Mjungu, J. Rosen, M. N. Muller, A. E. Pusey, and R. W. Wrangham. ’impact hunters’ catalyse cooperative hunting in two wild chimpanzee communities. Philos Trans R Soc Lond B Biol Sci, 370(1683), 2015.
  • [11] L. Glowacki and C. von Rueden. Leadership solves collective action problems in small-scale societies. Philos Trans R Soc Lond B Biol Sci, 370(1683), 2015.
  • [12] A. Goyal, F. Bonchi, and L. V. Lakshmanan. Discovering leaders from community actions. CIKM’08, 499–508, 2008.
  • [13] A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning influence probabilities in social networks. WSDM’10, 241–250. ACM, 2010.
  • [14] X. He and D. Kempe. Stability of Influence Maximization. ArXiv e-prints, 2015.
  • [15] R. Heinsohn and C. Packer. Complex cooperative strategies in group-territorial african lions. Science, 269(5228):1260–1262, 1995.
  • [16] T. K. Ho. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell., 20(8):832–844, 1998.
  • [17] P. L. Hooper, H. S. Kaplan, and J. L. Boone. A theory of leadership in human cooperative groups. J Theor Biol, 265(4):633–646, 2010.
  • [18] A. Java, P. Kolari, T. Finin, and T. Oates. Modeling the spread of influence on the blogosphere. WWW’06, 22–26, 2006.
  • [19] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. KDD’03, 137–146, 2003.
  • [20] M. B. Kjargaard, H. Blunck, M. Wustenberg, K. Gronbask, M. Wirz, D. Roggen, and G. Troster. Time-lag method for detecting following and leadership behavior of pedestrians from mobile sensing data. Proc. IEEE PerCom, 56–64, 2013.
  • [21] Z. Li, F. Wu, and M. C. Crofoot. Mining Following Relationships in Movement Data. ICDM’13, 458–467, 2013.
  • [22] D. Lusseau and L. Conradt. The emergence of unshared consensus decisions in bottlenose dolphins. Behav Ecol Sociobiol, 63(7):1067–1077, 2009.
  • [23] R. Mares, A. J. Young, and T. H. Clutton-Brock. Individual contributions to territory defence in a cooperative breeder: weighing up the benefits and costs. Philos Trans R Soc Lond B Biol Sci, 279(1744):3989–3995, 2012.
  • [24] R. O’Gorman, J. Henrich, and M. Van Vugt. Constraining free riding in public goods games: designated solitary punishers can sustain human cooperation. Philos Trans R Soc Lond B Biol Sci, 276(1655):323–329, 2009.
  • [25] D. A. M. P. Weinstein. Leadership behaviour in sawfly larvae Perga dorsalis (Hymenoptera: Pergidae). Oikos, 79(3):450–455, 1997.
  • [26] H. Pham and C. Shahabi. Spatial Influence - Measuring Followship in the Real World. ICDE’16, 2016.
  • [27] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Sig. Process., 26(1):43–49, 1978.
  • [28] M. Shokoohi-Yekta, J. Wang, and E. Keogh. On the Non-Trivial Generalization of Dynamic Time Warping to the Multi-Dimensional Case. SDM’15, 289–297.
  • [29] J. E. Smith, J. R. Estrada, H. R. Richards, S. E. Dawes, K. Mitsos, and K. E. Holekamp. Collective movements, leadership and consensus costs at reunions in spotted hyaenas. Anim Behav, 105:187–200, 2015.
  • [30] J. E. Smith, S. Gavrilets, M. B. Mulder, P. L. Hooper, C. E. Mouden, D. Nettle, C. Hauert, K. Hill, S. Perry, A. E. Pusey, M. van Vugt, and E. A. Smith. Leadership in mammalian societies: Emergence, distribution, power, and payoff. Trends Ecol Evol, 31(1):54–66, 2016.
  • [31] F. Solera, S. Calderara, and R. Cucchiara. Learning to identify leaders in crowd. CVPR’15 Workshops, 43–48, 2015.
  • [32] J. C. Stewart and J. P. Scott. Lack of correlation between leadership and dominance relationships in a herd of goats. J Comp Physiol Psych, 40(4):255–264, 1947.
  • [33] A. Strandburg-Peshkin, D. R. Farine, I. D. Couzin, and M. C. Crofoot. Shared decision-making drives collective movement in wild baboons. Science, 348(6241):1358–1361, 2015.
  • [34] A. Strandburg-Peshkin, D. R. Farine, I. D. Couzin, and M. C. Crofoot. Shared decision-making drives collective movement in wild baboons. Science, 348(6241):1358–1361, 2015.
  • [35] A. Strandburg-Peshkin, C. R. Twomey, N. W. F. Bode, A. B. Kao, Y. Katz, C. C. Ioannou, S. B. Rosenthal, C. J. Torney, H. S. Wu, S. A. Levin, and I. D. Couzin. Visual sensory networks and effective information transfer in animal groups. Current Biology, 23(17):R709–R711, 2016.
  • [36] R. Sulo, T. Berger-Wolf, and R. Grossman. Meaningful selection of temporal resolution for dynamic networks. MLG’10, 127–136. ACM, 2010.
  • [37] J. Sun and J. Tang. A Survey of Models and Algorithms for Social Influence Analysis, 177–214. 2011.
  • [38] C. Tan, J. Tang, J. Sun, Q. Lin, and F. Wang. Social Action Tracking via Noise Tolerant Time-varying Factor Graphs. KDD’10, 1049–1058, 2010.
  • [39] M. Van Vugt. Evolutionary origins of leadership and followership. Pers Soc Psychol Rev, 10(4):354–371, 2006.