Fine-Grained Retrieval of Sports Plays using Tree-Based Alignment of Trajectories

10/06/2017 ∙ by Long Sha, et al. ∙ qut California Institute of Technology STATS 0

We propose a novel method for effective retrieval of multi-agent spatiotemporal tracking data. Retrieval of spatiotemporal tracking data offers several unique challenges compared to conventional text-based retrieval settings. Most notably, the data is fine-grained meaning that the specific location of agents is important in describing behavior. Additionally, the data often contains tracks of multiple agents (e.g., multiple players in a sports game), which generally leads to a permutational alignment problem when performing relevance estimation. Due to the frequent position swap of agents, it is difficult to maintain the correspondence of agents, and such issues make the pairwise comparison problematic for multi-agent spatiotemporal data. To address this issue, we propose a tree-based method to estimate the relevance between multi-agent spatiotemporal tracks. It uses a hierarchical structure to perform multi-agent data alignment and partitioning in a coarse-to-fine fashion. We validate our approach via user studies with domain experts. Our results show that our method boosts performance in retrieving similar sports plays -- especially in interactive situations where the user selects a subset of trajectories compared to current state-of-the-art methods.



There are no comments yet.


page 1

page 3

page 4

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Research into “exemplar-based” and “sketched-based” approaches to image retrieval has recently surged 

(Bui and Collomosse, 2015; Yu et al., 2016; Zhang et al., 2016). The recent popularity of these fine-grained retrieval methods is due to the inadequacies of current “text-based” or “key-word” query-based methods. As a picture tells a thousand words, using examples or sketches which capture the fine-grained attributes that the user is interested in has shown to be superior to text-based searches.

Figure 1. In this paper we focus on retrieving fine-grained multi-agent data in basketball using a tree-based alignment method. We focus on two types of retrieval tasks: (a) Given a four-second example play (blue is offensive team, green is defense and red is the ball – the small circle on each trajectory shows the end point) we use that as the input query and retrieve all plays that look similar to that query; and (b) Given the same play, the user selects only the trajectories of interest and the retrieval is based on that chosen subset.

We study the setting of fine-grained retrieval of multi-agent spatiotemporal data such as sports plays. A depiction of the “exemplar search” problem is shown in Figure 1(a). Given an input play of a specific length (say 4 seconds), the input query is then compared to the entire database of 4 second plays and a ranked list of most similar plays are then retrieved. Additionally, the user selects only the players that they are interested in (Figure 1(b)). Based on the subset of players, the system then retrieves a ranked list of similar plays based only on the selected trajectories. Previous work showed that users much preferred the exemplar and sketched-based method over conventional keyword-based retrieval system (Sha et al., 2016). Most crucially, the retrieval system allowed users to retrieve fine-grained plays in a matter of seconds, instead of days which can often be the case in practice in sports domains.

One major challenge in relevance estimation for multi-agent trajectories is that of alignment. Although low-dimensional compared to using an image-based representation, the inherent problem of using the raw multi-agent data is that of the misalignment due to the constant swapping of player positions over the course of a play (i.e., permutation problem). One approach to circumventing this issue is to utilize a preprocessing step which pre-aligned the multi-agent data to a template was utilized which allowed for quick local trajectory comparison.

Figure 2. (a) Given an initial ordering with a color corresponding to each player in a team, we show that a player’s position across a quarter of a game is quite random. (b) But if we align or permute the ordering at the frame-level to this template, we can (c) discover the hidden structure of the team. In this plot we show the alignment to a single template, but we will show later than using a tree of templates is more effective.

In this paper, we propose an improved multi-agent data alignment method which gives improved fine-grained retrieval performance. Our proposed method uses a hierarchical structure to perform multi-agent data alignment and partitioning in a coarse-to-fine fashion. Our approach can be easily integrated into existing spatiotemporal retrieval pipelines. We validate our approach using over a wide range of retrieval tasks. Our user study results demonstrate significant benefits of our method over previous relevance estimation methods for sports play retrieval.

The rest of the paper is as follows. In Section 2 we describe the importance of aligning multi-agent data and why using the raw data is preferred. In Section 3, we explain our proposed tree-based multi-data alignment method, and Section 4 describes how this is implemented within a retrieval framework. Section 5 shows our results, and Section 6 gives the relevant related work. We conclude with a summary and discuss future work.111A demo video of our work can be viewed in

2. Measuring Similarity via the Alignment of Raw Multi-Agent Data

For effective retrieval to take place, we need an accurate and efficient similarity measure between multi-agent inputs. As shown in Figure 2(a), if we look at the raw positional data of a single team in basketball (i.e., 5 players) across a quarter of a match, given an initial ordering we can see that this ordering of the positional data contains little team structure as players tend to constantly switch positions. To counter this issue, we could exhaustively compare the pairwise distance between each player in the input query to a candidate play in the database which is manageable (). But if we include the other team, the exhaustive approach starts to get prohibitive222In basketball, there are 5 players per team. To compare the offensive team trajectories (i.e., team with the ball), there are permutations or comparisons required. To include the defensive team, we square this - . We then add the ball trajectory comparison which yields 14,401 comparisons. For other team sports such as soccer which have a higher number of players (i.e., 10 on-field players), the number of permutations are higher than the number of atoms in the universe - .

A solution to bypass the alignment issue is to use hand-crafted features (Li and Chellappa, 2010; Stracuzzi et al., 2011; Wang et al., 2004; Wei et al., 2014). Alternatively, a more intuitive approach would be to compare plays as images as it is a visual data source and it has been employed previously for multi-agent data (Yue et al., 2014; Zheng et al., 2016; Miller et al., 2014).

To employ an image-based approach we can do the following. Say we have a play across a window of time, like four-seconds, and the player and ball information is captured at 25 frames-per-second, we first quantize a court into a series of foot cells (each one being a pixels). For a feet basketball court, this would result in a RGB image (), with each channel being assigned to each team (offensive=blue, defense=green and ball=red). In addition to being high-dimensional, it is also lossy, meaning if we wanted to reconstruct the original signal this would be problematic as we have thrown away the temporal structure (i.e., we would not know which location each player was at in each frame). To maintain the original temporal structure, we could add another channel which would result is an extremely sparse and even higher-dimensional input signal –

. For a retrieval system, this is highly undesirable as this would require us to store this high-dimensional data in addition to the raw data.

However, this approach is unnecessary when one considers that the original input data to create the image is already super compact and can be described via a dense matrix of spatial positions. For example, given the location of all 10 players and the of the ball, we can represents each frame as a

dimensional vector. Across

frames, the multi-agent behavior can be represented as a matrix – which in this case would be a dimensionality of . To utilize the raw data, a solution is to align the raw positional data to a role template. This method was proposed by Lucey et al. (Lucey et al., 2013), which dynamically assigns an unique role to each agent in each frame according to a single template.

Figure 2(b) shows the formation template of role-based alignment and (c) shows the aligned player positions once this method has been applied. The last plot clearly shows that some type of team structure is obtained. In terms of retrieval, this means that once the permutation matrix has been applied - only a single comparison between trajectories needs to be made. Additionally, only the permutation matrix needs to be stored and not a high-dimensional representation like an image.

Even though effective, as can been seen in Figure 2(c), the role-based method is suboptimal as it only picks up the coarse structure of a team. In this figure, we see the thin strips across the court which does not coincide with any meaningful interpretation of the game of basketball. A more meaningful representation would pick up the typical defensive and offensive structures. In the next section, we show how we can do this which yields better retrieval performance.

3. Tree-Based Alignment

We now describe our main technical contribution. For effective retrieval using raw multi-agent data, accurate alignment is required. At a high-level, this means that we want to find the ordering of players in the input query to the candidate play which minimizes the difference between the two. Technically, this refers to finding the permutation matrix that minimizes the distance between all the agents in one team


where is the matrix of the spatial positions of agents in the input query in one team with an initial agent order (i.e., that order is fixed across that window of time), is the matrix of spatial positions of agents in order according to a pre-aligned template for a candidate play in the database, and permutation matrix indicates the correspondence of agents between and .

In terms of pre-aligning the data in the database, a similar approach is used but instead of just applying the permutation matrix to the input query, we find the permutation matrix between every input play in the database and the gold-standard template, . To determine the gold-standard template, this can be either hand-crafted by a domain (Lucey et al., 2013) or learnt in a data-driven method using the EM algorithm (Bialkowski et al., 2014b).

Figure 3. The reconstruction cost in Equation 4 at each layer. We set the minimum depth to 6 since it shows as the elbow point in this plot.

The above approach has yielded reasonable performance but it assumes the observed behavior is linear (i.e., single state). In complex scenarios like those that exist in a team sport like basketball, it is a more reasonable assumption that the behaviors are non-linear which consists of many states. As such, it would make intuitive sense that a superior approach is to learn a separate template for each of these states.

As the various game-states are not explicit, we can use hierarchical clustering to discover these latent states to provide better alignment. However, this presents a “chicken-or-the-egg” problem, as we can not cluster the multi-agent data without it being aligned first. As such, we can apply a coarse-to-fine approach where we first align the data to a coarse template, and then using this initial alignment we can partition the data into finer states which provide templates which allow us to find a better alignment.

A feasible method to do this would be to use a tree approach, which iteratively executes two steps: alignment and data partitioning. The ultimate goal is to find a set of states/templates that can reasonably reconstruct the complex multi-agent behaviors, and align each data point with the corresponding states/template.

Figure 4. An leaf node example and the distribution heat map of each agent. The top left image shows the centroid of this leaf node while others shows the agent distribution heat map within this node.
1:procedure TemplateLearn()
2:     Initialize the template with a randomly selected example
3:     while  or  do
4:         for Each sample in dataset  do
5:              Calculate for each agent-agent pair
6:              Compute using Hungarian algorithm.
7:              Align the example
8:         end for
9:         Update the template by averaging aligned
10:         Compute the difference
11:     end while
13:end procedure
Algorithm 1 Template Learning Algorithm

3.1. Step 1: Alignment

To start with, let us focus on the alignment step. The goal of our multi-agent alignment is to compute a permutation matrix for each example which minimizes the distance between this example and the template of this state.

In each state, the template contains a canonical spatial ordering of agents. Our learning method utilizes a general EM approach, which learns the template in an unsupervised way. The template learning process in a given class is shown in Algorithm 1.

Once the template is obtained, the permutation matrix for a given example in this state can be computed:


This is a linear assignment problem and can be solved by computing a cost matrix for agents matching. Given the raw tracking data of agents, let us denote as a cost matrix that contains the Euclidean distance between each agent-agent location pair. indicates the distance between agent in the template and the agent in the given example, which is the cost of this agent-agent assignment.


Once the cost matrix is computed, the Hungarian algorithm (Kuhn, 1955) is used to find the permutation matrix that minimizes the overall assignment cost.

Figure 5. The player distribution after our tree-based alignment. Our method reveals more detailed formation at each side of the court.

3.2. Step 2: Data Partitioning

Since the alignment has been solved, partitioning the data (i.e., clustering) into distinct states can be performed:


where is a set of clusters and represents the -th data cluster. The clustering operation splits the data into more specific states and enable finer alignment to occur.

As a clustering problem, we need to define the number of clusters to use. To determine this number of clusters, the specific application needs to be considered. For our case, we are interested in play retrieval, which means we need to balance two things: i) the total number of clusters is small, but still retaining high within-cluster similarity, and ii) the number of plays in each cluster high enough so we have enough plays to retrieve but small enough so that responsive retrieval can occur. To aid with this balancing act, we use an additional term which is similar to the idea of Silhouette analysis (Rousseeuw, 1987) to constrain the number of clusters in each node of our tree:


where represent the mean of the cluster that example belongs to and indicates the mean of the closest neighbor cluster of example . Equation 5 measures the dissimilarity between neighboring clusters and how tightly the data is grouped within each cluster. When the number of clusters becomes too large, the similarity between neighboring clusters increases and decrease as well. Thus, we want to maximize to have the most discriminative clusters.

For the data partitioning in each node, we attempt -means clustering with equals to 2 to 10 and for each we compute the score . The that provides the maximal will be selected to split the data in the current node.

3.3. Tree Growth

Since we can both align and cluster the multi-agent data, we can now learn the tree. To clarify the notation, we use the subscript and superscript to indicate the node index and layer index. indicates the data group in the th node of the th layer, indicates the classes found from that node represents the template in that node and is the permutation matrix computed by using that template. For every node, we first use the Algorithm 1 to align the data that is assigned to this node and then apply the clustering technique to split them into finer states.

It is worth noting that the templates in each node should also be aligned so that the consistency of agents permutation can be preserved. Thus, the new template of node at layer is aligned to its parent template in the previous layer . it aligns the current template to its parent template . Then the same process repeat for each node in our tree. Algorithm 2 summarizes the learning process of our tree-based method. During the learning process, the clusters and templates at each layer are stored for aligning process.

1:procedure LearnTree()
3:     for each layer  do
4:         for each node  do
5:              learn template using Algorithm 1
6:              align to parent
7:              align data
8:         end for
9:         store in
10:         compute reconstruct loss with Eq. 4
11:         terminate when stop criterion satisfies
12:         for each node  do

              Conduct K-means on

with different
14:              Select cluster set that maximizes
15:              partition to child nodes according to
16:         end for
17:         Store in
18:     end for
19:return ,
20:end procedure
Algorithm 2 Learning process of tree-based alignment

There are two stop criterion: 1) a pre-defined maximum number of examples in each leaf node, 2) a pre-defined depth. From Equation 4, we know that the reconstruction cost would reach minimum if we had infinite states . Thus, we aim to find a minimum depth that can provide a considerably low cost. We plot the overall cost of Equation 4 at each layer in Figure 3 and set the minimum depth to 6 layers as it shows as the elbow point. A much deeper tree may be built for fast retrieval, but a 6-layer tree can achieve reasonable performance for alignment purpose. Figure 4 shows an example of one leaf node. The top left image shows the centroid of the leaf node and others shows the distribution heat map of ball and each agent.

In terms of applying the tree-based alignment, given an input play, the player permutation is aligned to the global template at the root node first, it then moves to a child node by finding the nearest neighbor and repeats the alignment again. The aligned data in our tree-based method can be expressed as:


where represents the permutation matrix at layer . Essentially, such composition of permutation matrices yields the optimal ordering of the multiple agents. Figure 5 shows the location distribution of each aligned agent across a quarter. By contrast to Figure 2(c), our method reveals more meaningful structure at each side of the court.

Figure 6. The compressibility test results of using our tree-based alignment, role-based alignment and naive identity-based alignment. It shows our method provide the best compressibility.

3.4. Alignment Evaluation

Since better alignment should result in a more compressed input feature, we can evaluate our tree-based alignment via clustering and principle component analysis (PCA). To make clustering a fair comparison, instead of using the clusters generated by our approach inherently, K-means clustering is applied to both alignment methods. Given 100,000 frames, they are aligned with the role-based method and our method separately. Then, we apply K-means clustering to inspect the average within-cluster-error (WCE) with different K’s


here we abuse to indicate the

th cluster after K-means clustering. PCA is used to inspect the variance explained by different eigenvectors and we calculate the variance via


where is the

th eigenvalue that indicates the significance of the

th eigenvector.

Apart from the role-based alignment and our tree-based alignment, the naive identity-based alignment is also used as a baseline. Identity-based approach only compares the trajectories according to players’ identities which refer to an initial logical ordering (i.e., the player most like the point-guard ordered first, then the shooting-guard to the center - with this ordering fixed). Figure 6 (left) shows the result of our clustering test and on the right, shows the performance using PCA. Both results show that our alignment gives the better compressibility.

4. Fine-Grained Retrieval System

The prime motivation of obtaining better alignment of the raw multi-agent data is to achieve better fine-grained retrieval. As depicted back in Figure 1, instead of typing a textual description of the play, users can selects an (a) example or modified example (b). The initial idea was first proposed in (Sha et al., 2016), where they utilized a simple hash-table by only clustering the ball trajectories. Although effective, that approach is not an optimal solution since such hashing ignores the information of players. In this section, we show that we obtain better fine-grain retrieval utilizing our tree-based approach. We call the approach in  (Sha et al., 2016) as the Baseline Method.

Figure 7. The overview of our retrieval system. The top part shows the pre-processing and the bottom part shows the retrieval process.
Figure 8. The retrieval process of our system. (a) The input query. (b) Tree-based hashing which aligns the query and computes the hash key. (c) Find candidates with the hash key. (d) Fetch plays from the database. (e) Rank and return all the results.
Figure 9. A retrieval example of using two methods. The top row shows the top-5 results returned by the baseline method (Sha et al., 2016) while the bottom row shows the top-5 results returned by our method.

The dataset that is used in this work is the SportVU basketball dataset. The dataset is captured by the STATS SportVU system (STATS, 2015), which generates location data for every player and ball at 25Hz, along with detailed logs for actions such as passes, shots, fouls, etc. The dataset is taken from 1300 games from the last two seasons of a professional basketball league. 1200 games are used as our database and queries are extract from the rest 100 games. The tracking data of each game is stored in a separate table and each row contains the information of one player at one frame, which are time, team ID, player ID, action ID and the location of the player or the ball. Although our retrieval focuses on short plays, the database is still stored in its original form. The information of plays are saved in our hash table.

A block digram of our retrieval system is given in Figure 7. In consists of two parts: i) pre-processing and ii) retrieval. In terms of pre-processing, we first extract all the plays with fixed lengths (i.e., 1, 2, 3, 4 and 5secs continuous chunks). For each input play , we then find the permutation matrix according to our tree which finds the best ordering and then store that permutation matrix for that play. To enable fast play retrieval, we then associate each play to a hash-key which is found using our alignment/clustering method describe in the previous section. Both permutation matrices and hash keys are stored in our hash table. Our hashing method is similar to the concept of locality sensitive hashing (LSH) since similar plays are placed to the same address.

In terms of retrieval, given an input play we first align it to the tree of templates. We then return all the plays that are associated with the hash-key (we call these the candidate plays), which are associated with the leaf-node of the template used to align the query at the lowest level. We then determine the similarity of the plays depending on which players were selected by the user for fine-grain interactive search and rank display the ranked results to the user.

An example of the query process is shown in Figure 8. Given a 4-second input query (which has the ball and the two offensive players selected) (a), we then put input query into our tree (b). Within the tree, we traverse the path through the tree until a leaf-node is reached (which is “B” as depicted by the red path). Based on that hash-key “B”, all the plays with that hash-key are then retrieved from the hash table (c) and fetched from the database (d). Similarity between the aligned input query and the plays in the database with that hash-key are then computed - with only the trajectories selected contributing to the similarity score. Once a similarity score has been calculated, a ranked list of the plays in order of similarity (or smallest distance between the trajectories) is given (e). Some additional fixed weightings depending on the team-ID or recent plays can also be included once that initial ranked list has been generated.

Figure 9 displays the top-5 results for the baseline method and our tree-based method given the input query. This examples qualitatively highlights the benefit of our approach as it shows that the baseline method can not find the corresponding players correctly due to the imperfect alignment while our method maintains a high consistency between results and the query. In the next section, we show the results of an user study which quantitatively shows that the tree-based method yields better interactive fine-grain retrieval.

5. User Experiments

5.1. Experiment Design

To show the benefit of our our tree-based alignment method to the previous method described in (Sha et al., 2016) – which we call the Baseline Method – we conducted a series of user studies which focused on the task of interactive fine-grain retrieval (i.e., where the user selects a subset of players within an example play). To enable a fair comparison, whilst also maintaining responsive retrieval times (), we set the maximum size of plays within a leaf node to 2000, which generates a deeper tree with 314 individual leaf nodes/hash entries. Both the baseline method and the tree-based method utilized 314 clusters.

Figure 10. Eight queries that are used in our user study, where blue is the offense team, green is the defense team and red represents the ball. The highlighted trajectories shows the selected players for the third retrieval setting.

Eight retrieval tasks were selected for our user study and we tested three different settings on each tasks.

  1. Retrieval conditioned on all the players and the ball

  2. Retrieval conditioned on the offense team and the ball

  3. Retrieval conditioned on two selected players and the ball

Because the baseline method requires the ball trajectory for its hashing function, ball is included in each setting. An user study is conducted for each setting and Figure 10 shows the eight queries for the third setting.

We evaluated the retrieval quality via an interleaved evaluation, where the top 10 results returned by the baseline method and our method were combined via the Team-Draft Interleaving method (Chapelle et al., 2012) into a single ranking (see Figure 11). The combined ranking and its query were then displayed in an online survey form so that users could view the results top-down and select the relevant plays.

5.2. Procedure

We recruited ten people with strong basketball background to participant our user studies. For each study, every person spent their first 5 minutes reading the instruction, which helped them to understand the images in Figure 10 and how to select relevant plays. After that, half an hour is allocated for each participant to finish all eight questions. Such procedure was repeated for each retrieval setting.

Each question contains one input query and interleaved retrieved results from two systems. If one result was returned by both systems, it would only be displayed once in our survey. Participants were asked to scan the results top-down and check the plays that they think are relevant to the input query.

5.3. Benchmark Results

Using the relevance feedbacks from the participants, we performed a benchmark comparison by using the average precision and the expected reciprocal rank of the first result, which are two standard retrieval evaluation metrics 

(Chapelle et al., 2009; Manning and Raghavan, [n. d.]; Salton and McGill, 1986). Let denote the rank of the -th relevant document, then the average precision of a ranking list can be computed by:


where is the precision of the top items in the ranking list, and is the number of relevant items in the ranking list. The expected reciprocal rank is simply the inverse of the rank of the first relevant result, which can be computed by:


The expected reciprocal rank is more sensitive to the efficiency of finding the first result while average precision is more recall-focused. For our user study, we computed both of them on the two ranking lists embedded in the interleaved ranking, and over the pooling of both top-10 results.

Table 1 shows the result of average precision in three different retrieval settings. In each setting, the top two rows show the mean average precision aggregated across all ten users for each method. It shows that our method has higher precision than the baseline method in all three settings, and the improvement of our method becomes larger when the queries become more specific (fewer agents are involved). The “Win/Loss” rows at bottom indicate how many individual participants have higher average precision with our method. It shows our tree-based approach wins in most users.

Table 2 compares expected reciprocal rank, which has the same structure to Table 1. Please note that since both methods may have the first relevant result at the same rank, a draw could occur sometimes. Thus, each cell in the “Win/Loss” rows may not sum to 10. Similarly, our method outperforms the baseline method especially in the third setting (S3). It shows that using one template cannot find the corresponding agents correctly when queries become very specific. The bar charts in Figure 12 highlight the overall “Win/Loss” in Table 1 and 2.

Figure 11. Depicting an interleaving of two rankings.

6. Related Work

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Overall
S1 Baseline Method 0.12 0.07 0.08 0.25 0.15 0.25 0.24 0.37 0.19
Tree-Based 0.57 0.67 0.50 0.34 0.72 0.58 0.39 0.42 0.52
Win / Lose 10 / 0 10 / 0 10 / 0 6 / 4 10 / 0 10 / 0 5 / 5 5 / 5 66 / 14
S2 Baseline Method 0.12 0.09 0.08 0.18 0.27 0.36 0.21 0.37 0.21
Tree-Based 0.77 0.70 0.81 0.57 0.73 0.58 0.50 0.47 0.65
Win / Lose 10 / 0 10 / 0 10 / 0 10 / 0 8 / 2 8 / 2 9 / 1 8 / 2 73 / 7
S3 Baseline Method 0.04 0.07 0.12 0.08 0.23 0.09 0.21 0.15 0.12
Tree-Based 0.87 0.78 0.68 0.71 0.70 0.73 0.56 0.58 0.70
Win / Lose 10 / 0 10 / 0 10 / 0 10 / 0 9 / 1 10 / 0 6 /4 10 / 0 75 / 5
Table 1. This table compares the mean average precision aggregated across all users for each query (Q1-Q8) with three different retrieval settings (S1-S3). The “Win / Lose” rows show the number of users for whom the tree-based approach achieved a higher average precision.
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Overall
S1 Baseline Method 0.5 0.44 0.33 0.58 0.43 0.82 0.66 0.75 0.56
Tree-Based 0.75 0.94 0.55 0.49 0.94 1 0.73 0.91 0.79
Win / Lose 5 / 2 9 / 0 8 / 2 3 / 5 7 / 0 2 / 0 5 / 3 3 / 1 42 / 19
S2 Baseline Method 0.51 0.27 0.45 0.82 0.86 0.9 0.71 0.80 0.66
Tree-Based 1 0.83 1 0.86 1 0.9 0.85 0.84 0.91
Win / Lose 5 / 0 8 / 0 10 / 0 2 / 0 2 / 0 2 / 0 4 / 1 3 / 1 36 / 2
S3 Baseline Method 0.2 0.36 0.48 0.37 0.63 0.32 0.75 0.54 0.46
Tree-Based 1 1 0.76 1 0.95 1 0.71 0.9 0.92
Win / Lose 10 / 0 8 / 0 6 / 1 9 / 0 7 / 1 9 / 0 0 / 4 6 / 2 55 / 8
Table 2. This table compares the expected reciprocal rank aggregated across all users for each query (Q1-Q8) with three different settings (S1-S3). The “Win / Lose” rows show the number of users for whom the tree-based approach found a relevant result earlier in the ranking.

Information retrieval has a long research history in computer science domain (Schütze, 2008; Chowdhury, 2010) and the majority of those previous studies focus on tokenized query format. Although tokenized query has been widely used to both text data and multimedia data (Kim and Xing, 2013; Xia et al., 2013; Zhuang and Hoi, 2011), some research has shown that using free-form or “ad-hoc” queries can be significantly more user-friendly (Manning and Raghavan, [n. d.]; Salton and McGill, 1986). One popular free-form query type in recent research is the exemplar-based/sketch-based query format (Bui and Collomosse, 2015; Yu et al., 2016; Zhang et al., 2016) in image retrieval. Such query format enables users to issue the query at a more intuitive and precise level.

In terms of sports analytics domain, most works focus on evaluating and comparing players performance (Franks et al., 2015; Chen and Joachims, 2016), analyzing broadcasting videos (Liu et al., 2005) and discovering behavior patterns/styles (Miller et al., 2014; Yue et al., 2014; Wei et al., 2016). Similar to other domains, the conventional approach of sports data retrieval still uses directory/taxonomy paradigm (Wall, 2011) to categorize sports plays (McQueen et al., 2014; Chen et al., 2014; Bialkowski et al., 2014a). Since multi-agent spatiotemporal data has been widely collected, using the sketch-based or exemplar-based query can be a more effective and user-friendly solution. The first formal spatiotemporal query paradigm is purposed by Sha et al. (Sha et al., 2016). They developed a new interface that accepts exemplar-based or sketch-based queries for sports play retrieval. It was reported that their query format is much more effective and user-friendly than the conventional text-based retrieval.

From the technical perspective, the primary challenge of retrieving multi-agent spatiotemporal data is how to compare them effectively. Although similarity measure has been well investigated on trajectories and time series  (Chen et al., 2007; Toohey and Duckham, 2015; Eichmann and Zgraggen, 2015), most of them only focused on single trajectories rather than multi-agent ones. The seminal work of comparing multi-agent data called “role-based” representation (Lucey et al., 2013; Wei et al., 2013). It uses a formation template to order the agents so that corresponding agents can be found between two samples. However, this method is suboptimal because using only one template is agnostic to those fine-grained behaviors.

Figure 12. These bar charts indicate the overall win rate of our method with each metric and retrieval setting. They show that our method outperforms the baseline in all situations.

Apart from the permutation alignment of multi-agent data, we still need to find the an effective similarity measure between individual pairs of trajectories. There are two main categories in trajectory comparison studies, one of them focuses on elastic measure that addresses shifting and warping issues in both time and space domains (Listgarten et al., 2005; Chen et al., 2007; Keogh and Ratanamahatana, 2005; Keogh and Pazzani, 2000), while the other group of research focuses on finding the most similar or dissimilar points between two trajectories, which ensures the robustness (Lou et al., 2002; Junejo et al., 2004). Euclidean distance is used in our work because the experimental result in  (Sha et al., 2016) shows that Euclidean distance is still the most effective metric for trajectory comparison in sports.

In all modern retrieval system, fast indexing is required for fast search through a large database. Hash table is one of the most popular approach to achieve this purpose (Blanco et al., 2015; Zhang et al., 2015; Liu et al., 2017). Hash function is normally designed for specific domain and application, but in general, it aims to reduce the time cost. Similar to (Sha et al., 2016), our method uses the concept of locality sensitive hashing (LSH) (Indyk and Motwani, 1998; Gionis et al., 1999), which is designed to place similar samples into a same address. Such method has been applied in other settings where similarity measure or ranking is required (Setty et al., 2017; Liu et al., 2017).

7. Conclusion, Discussion and Future Work

In this paper, we presented a new tree-based alignment method which enables effective similarity measure between multi-agent trajectories data. Based on this method, we also presented a retrieval system tailored towards accurate and efficient sports play retrieval. A compressibility experiment showed that our tree-based method outperforms the state-of-the-art alignment method, and our full-stack retrieval system demonstrate its effectiveness in an user study where our approach achieve a higher precision than the baseline method. From a relevance estimation standpoint, even though our alignment has improved the similarity measure, the choice of distance metric can be improved. For instance, the concept of “inverse document frequency” (Salton and McGill, 1986)

in information retrieval indicates that tokens that appear more frequent in document are not as indicative of relevance as those more rare words. Thus, similar idea can be incorporated into trajectory distance measure. More generally, machine learning techniques can be applied to learn a better distance metric with appropriate training data. Apart from retrieval, our tree-based alignment could be applied to other important data mining tasks. Since a playbook is learned inherently by our tree, it can be used in game summarization and team/player characterization where those game states are required. Beyond sports, our method could be applied to a wide range of domains where multi-agent spatiotemporal data is involved. One example could be the crowd behavior analysis in surveillance domain.


  • (1)
  • Bialkowski et al. (2014a) Alina Bialkowski, Patrick Lucey, Peter Carr, Yisong Yue, and Iain Matthews. 2014a. Win at home and draw away: Automatic formation analysis highlighting the differences in home and away team behaviors. In Proceedings of 8th Annual MIT Sloan Sports Analytics Conference.
  • Bialkowski et al. (2014b) Alina Bialkowski, Patrick Lucey, Peter Carr, Yisong Yue, Sridha Sridharan, and Iain Matthews. 2014b. Large-scale analysis of soccer matches using spatiotemporal tracking data. In Data Mining (ICDM), 2014 IEEE International Conference on. IEEE, 725–730.
  • Blanco et al. (2015) Roi Blanco, Giuseppe Ottaviano, and Edgar Meij. 2015. Fast and space-efficient entity linking for queries. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining. ACM, 179–188.
  • Bui and Collomosse (2015) Tu Bui and John Collomosse. 2015. Scalable sketch-based image retrieval using color gradient features. In

    Proceedings of the IEEE International Conference on Computer Vision Workshops

    . 1–8.
  • Chapelle et al. (2012) Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Large-scale validation and analysis of interleaved search evaluation. ACM Transactions on Information Systems (TOIS) 30, 1 (2012), 6.
  • Chapelle et al. (2009) Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 621–630.
  • Chen et al. (2014) Sheng Chen, Zhongyuan Feng, Qingkai Lu, Behrooz Mahasseni, Trevor Fiez, Alan Fern, and Sinisa Todorovic. 2014. Play type recognition in real-world football video. In Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on. IEEE, 652–659.
  • Chen and Joachims (2016) Shuo Chen and Thorsten Joachims. 2016. Modeling intransitivity in matchup and comparison data. In Proceedings of the ninth acm international conference on web search and data mining (WSDM). ACM, 227–236.
  • Chen et al. (2007) Yueguo Chen, Mario A Nascimento, Beng Chin Ooi, and Anthony KH Tung. 2007. Spade: On shape-based pattern detection in streaming time series. In Data Engineering, 2007. ICDE 2007. IEEE 23rd International Conference on. IEEE, 786–795.
  • Chowdhury (2010) Gobinda G Chowdhury. 2010. Introduction to modern information retrieval. Facet publishing.
  • Eichmann and Zgraggen (2015) Philipp Eichmann and Emanuel Zgraggen. 2015.

    Evaluating subjective accuracy in time series pattern-matching using human-annotated rankings. In

    Proceedings of the 20th International Conference on Intelligent User Interfaces. ACM, 28–37.
  • Franks et al. (2015) Alexander Franks, Andrew Miller, Luke Bornn, and Kirk Goldsberry. 2015. Counterpoints: Advanced defensive metrics for nba basketball. In 9th Annual MIT Sloan Sports Analytics Conference, Boston, MA.
  • Gionis et al. (1999) Aristides Gionis, Piotr Indyk, Rajeev Motwani, et al. 1999. Similarity search in high dimensions via hashing. In VLDB, Vol. 99. 518–529.
  • Indyk and Motwani (1998) Piotr Indyk and Rajeev Motwani. 1998.

    Approximate nearest neighbors: towards removing the curse of dimensionality. In

    Proceedings of the thirtieth annual ACM symposium on Theory of computing

    . ACM, 604–613.
  • Junejo et al. (2004) Imran N Junejo, Omar Javed, and Mubarak Shah. 2004. Multi feature path modeling for video surveillance. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Vol. 2. IEEE, 716–719.
  • Keogh and Ratanamahatana (2005) Eamonn Keogh and Chotirat Ann Ratanamahatana. 2005. Exact indexing of dynamic time warping. Knowledge and information systems 7, 3 (2005), 358–386.
  • Keogh and Pazzani (2000) Eamonn J Keogh and Michael J Pazzani. 2000. Scaling up dynamic time warping for datamining applications. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 285–289.
  • Kim and Xing (2013) Gunhee Kim and Eric P Xing. 2013. Time-sensitive web image ranking and retrieval via dynamic multi-task regression. In Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 163–172.
  • Kuhn (1955) Harold W Kuhn. 1955. The Hungarian method for the assignment problem. Naval research logistics quarterly 2, 1-2 (1955), 83–97.
  • Li and Chellappa (2010) Ruonan Li and Rama Chellappa. 2010. Group motion segmentation using a spatio-temporal driving force model. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2038–2045.
  • Listgarten et al. (2005) Jennifer Listgarten, Radford M Neal, Sam T Roweis, and Andrew Emili. 2005. Multiple alignment of continuous time series. In Advances in neural information processing systems. 817–824.
  • Liu et al. (2017) Huiwen Liu, Jiajie Xu, Kai Zheng, Chengfei Liu, Lan Du, and Xian Wu. 2017. Semantic-aware query processing for activity trajectories. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 283–292.
  • Liu et al. (2005) Tie-Yan Liu, Wei-Ying Ma, and Hong-Jiang Zhang. 2005.

    Effective feature extraction for play detection in american football video. In

    Multimedia Modelling Conference, 2005. MMM 2005. Proceedings of the 11th International. IEEE, 164–171.
  • Lou et al. (2002) Jiangung Lou, Qifeng Liu, Tieniu Tan, and Weiming Hu. 2002. Semantic interpretation of object activities in a surveillance system. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, Vol. 3. IEEE, 777–780.
  • Lucey et al. (2013) Patrick Lucey, Alina Bialkowski, Peter Carr, Stuart Morgan, Iain Matthews, and Yaser Sheikh. 2013. Representing and discovering adversarial team behaviors using player roles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2706–2713.
  • Manning and Raghavan ([n. d.]) Christopher D Manning and P Raghavan. [n. d.]. H. Schu tze. 2008. Introduction to Information Retrieval. ([n. d.]).
  • McQueen et al. (2014) Armand McQueen, Jenna Wiens, and John Guttag. 2014. Automatically recognizing on-ball screens. In 2014 MIT Sloan Sports Analytics Conference.
  • Miller et al. (2014) Andrew Miller, Luke Bornn, Ryan Adams, and Kirk Goldsberry. 2014. Factorized point process intensities: A spatial analysis of professional basketball. In International Conference on Machine Learning. 235–243.
  • Rousseeuw (1987) Peter J Rousseeuw. 1987.

    Silhouettes: a graphical aid to the interpretation and validation of cluster analysis.

    Journal of computational and applied mathematics 20 (1987), 53–65.
  • Salton and McGill (1986) Gerard Salton and Michael J McGill. 1986. Introduction to modern information retrieval. (1986).
  • Schütze (2008) Hinrich Schütze. 2008. Introduction to information retrieval. In Proceedings of the international communication of association for computing machinery conference.
  • Setty et al. (2017) Vinay Setty, Abhijit Anand, Arunav Mishra, and Avishek Anand. 2017. Modeling Event Importance for Ranking Daily News Events. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 231–240.
  • Sha et al. (2016) Long Sha, Patrick Lucey, Yisong Yue, Peter Carr, Charlie Rohlf, and Iain Matthews. 2016. Chalkboarding: A new spatiotemporal query paradigm for sports play retrieval. In Proceedings of the 21st International Conference on Intelligent User Interfaces. ACM, 336–347.
  • STATS (2015) L STATS. 2015. SportVU. (2015).
  • Stracuzzi et al. (2011) David J Stracuzzi, Alan Fern, Kamal Ali, Robin Hess, Jervis Pinto, Nan Li, Tolga Konik, and Daniel G Shapiro. 2011. An application of transfer to american football: From observation of raw video to control in a simulated environment. AI Magazine 32, 2 (2011), 107–125.
  • Toohey and Duckham (2015) Kevin Toohey and Matt Duckham. 2015. Trajectory similarity measures. SIGSPATIAL Special 7, 1 (2015), 43–50.
  • Wall (2011) Aaron Wall. 2011. History of search engines: From 1945 to Google today. Atlantic Online (2011).
  • Wang et al. (2004) Jinjun Wang, Changsheng Xu, Engsiong Chng, Kongwah Wah, and Qi Tian. 2004. Automatic replay generation for soccer video broadcasting. In Proceedings of the 12th annual ACM international conference on Multimedia. ACM, 32–39.
  • Wei et al. (2016) Xinyu Wei, Patrick Lucey, Stuart Morgan, and Sridha Sridharan. 2016. Forecasting the Next Shot Location in Tennis Using Fine-Grained Spatiotemporal Tracking Data. IEEE Transactions on Knowledge and Data Engineering 28, 11 (2016), 2988–2997.
  • Wei et al. (2014) Xinyu Wei, Patrick Lucey, Stephen Vidas, Stuart Morgan, and Sridha Sridharan. 2014. Forecasting events using an augmented hidden conditional random field. In Asian Conference on Computer Vision. Springer, 569–582.
  • Wei et al. (2013) Xinyu Wei, Long Sha, Patrick Lucey, Stuart Morgan, and Sridha Sridharan. 2013. Large-scale analysis of formations in soccer. In Digital Image Computing: Techniques and Applications (DICTA), 2013 International Conference on. IEEE, 1–8.
  • Xia et al. (2013) Hao Xia, Pengcheng Wu, and Steven CH Hoi. 2013. Online multi-modal distance learning for scalable multimedia retrieval. In Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 455–464.
  • Yu et al. (2016) Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M Hospedales, and Chen-Change Loy. 2016. Sketch me that shoe. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 799–807.
  • Yue et al. (2014) Yisong Yue, Patrick Lucey, Peter Carr, Alina Bialkowski, and Iain Matthews. 2014. Learning fine-grained spatial models for dynamic sports play prediction. In Data Mining (ICDM), 2014 IEEE International Conference on. IEEE, 670–679.
  • Zhang et al. (2016) Hua Zhang, Si Liu, Changqing Zhang, Wenqi Ren, Rui Wang, and Xiaochun Cao. 2016. Sketchnet: Sketch classification with web images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1105–1113.
  • Zhang et al. (2015) Shaoting Zhang, Ming Yang, Timothee Cour, Kai Yu, and Dimitris N Metaxas. 2015. Query specific rank fusion for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 37, 4 (2015), 803–815.
  • Zheng et al. (2016) Stephan Zheng, Yisong Yue, and Jennifer Hobbs. 2016. Generating Long-term Trajectories Using Deep Hierarchical Networks. In Advances in Neural Information Processing Systems. 1543–1551.
  • Zhuang and Hoi (2011) Jinfeng Zhuang and Steven CH Hoi. 2011. A two-view learning approach for image tag ranking. In Proceedings of the fourth ACM international conference on Web search and data mining. ACM, 625–634.