## 1 Introduction

We are interested in studying the ball passing patterns of soccer teams. Passing is one of the key parts in soccer, possessing valuable information about different playing styles from across the world. As illustrated in Figure 1, one team’s spatial passing record aggregated in a game consists of number of ball passing-receiving events on the soccer field. Each event corresponds to a pass observed from origin node to destination node , both embedded in the soccer field—a two-dimensional rectangle space . Passing data for all the teams in matches of the FIFA World Cup 2014 in Brazil are available.

With the recent development of optical tracking systems and video extraction software, team-based human activities in professional sports are now routinely monitored at high spatiotemporal resolution, which opens up new avenues for quantitative characterization of team strategies and performance enriched with spatiotemporal structures. Recent advances along these lines have been made in the context of professional basketball. Miller et al. (2014) provide a quantitative summary of shooting habits and efficiency of basketball players, based on spatial locations of shot attempts made by NBA players on the offensive half court. Franks et al. (2015) further identify the intent of defenders and quantify the effect they have on both shot frequency and efficiency from player and ball tracking data. Cervone et al. (2016) focus on modeling players’ decision making tendencies in various situational, spatiotemporal circumstances and predicting expected number of points the offense will score on a particular possession.

While basketball is a high scoring game with very frequent shooting attempts and relatively simple passing dynamics, soccer is very low scoring and much of the game involves intricate passing configurations, which occasionally lead to shot attempts. Soccer is more a game of space invasion that is mainly undertaken through passes. It is interesting to identify interpretable summary motifs representing a small set of passes that teams often employ. However, the current literature lacks such methodology — typically focusing on simple summary statistics of team passing that ignore spatial information.

Traditionally, team’s passing performance is summarized in one easy-to-calculate yet overly simplified statistic, e.g., the possession percentage as a measure of team dominance. Network graphs improve upon it by providing us a simple characteristic abstraction of team’s passing behavior. For example, Duch, Waitzman and Amaral (2010), Peña and Touchette (2012) and Cintia, Rinzivillo and Pappalardo (2015) investigate player passing networks in which nodes are players and directed edges are passes and zone passing networks in which nodes are divided regions of the soccer field and edges are cumulative number of ball displacements between pairs of regions. These articles reduce network topological structure into simple metrics, such as node degree, betweenness and closeness centralities, clustering coefficients, etc., therefore mostly focusing on providing high-level overviews of topological structures of a single passing network. Although these network descriptors offer valuable insights in evaluating different aspects of teamwork performance, statistical and generative modeling for the observed passing patterns of multiple teams (potentially under different conditions) would provide a more comprehensive understanding of the characteristics of team’s strategies, aiding the design, planning and selection of competitive soccer tactics at the team level.

There is a rich literature on statistical network models; see Goldenberg et al. (2010) and Schmidt and Morup (2013) for reviews. There has been an enormous emphasis in the literature on node community detection [Holland, Laskey and Leinhardt (1983), Nowicki and Snijders (2001), Airoldi et al. (2008)], especially for single, undirected, binary networks. In our motivating application of soccer passing analytics, partitioning links [Ahn, Bagrow and Lehmann (2010),Ball, Karrer and Newman (2011), Zhou (2015)] into latent passing combination groups is a more meaningful goal than clustering nodes into groups. Extensions of these methods to directed and weighted networks are straightforward, but the replicated aspect requires careful innovation. Besides flexibly capturing assortative and disassortative structures [Hoff (2008)] within each single network, it is important to exploit the co-occurrence information across multiple networks and extract archetypal motifs, which could serve as building blocks for network comparison and predictive modeling.

Team’s passing history is synthesized under the form of spatial networks [Barthélemy (2011)] where nodes and edges are embedded in a rectangle soccer field space (115 yards 74 yards with slight variations). This is another relevant characteristic of the soccer passing network. The and co-ordinates of origin-destination locations of a pass possess important information about its type (e.g., short passes, long passes, flick-on, or pull back) and directions (e.g., backwards, sideways, or forwards). The soccer field is typically divided into several zones, either own half/opposition half, defensive/middle/final third, left/right/centre, or more elaborate Guardiola positional grids. There is no consensus upon the best approach to this division. Different division results in explaining the strategic and tactical peculiarities of the team play at different spatial resolutions. Besides the network topology, it is crucial to take the spatial structure inherent in these types of networks into account, and accommodate potential multi-resolution behaviors.

### 1.1 Replicated spatial passing networks

We focus on the passing data from the 2014 FIFA World Cup in Brazil. national teams advanced to the final tournament and a total of 64 matches were played. For each match, every completed pass is logged with and coordinates for its point of origin and destination. Although most passes do not lead directly to goals, they do manifest the team playing style in collaboration, partly in response to the defenses being faced with shots on goal relatively rare. These dynamics potentially vary across teams and matches. Instead of analyzing single passing networks separately, we are focused on replicated passing networks, which can be considered as realizations from some distribution over the space of all possible passing networks. The concept of replicated networks was introduced in Durante, Dunson and Vogelstein (2017) motivated by neuroscience applications. To emphasize the replicated, spatial aspects, and the directional asymmetry of our special type of networks, we use the terminology replicated spatial passing networks.

As an initial attempt, we construct each of the team’s spatial networks by dividing up the field into a grid of tiles, , with each tile in this grid representing a node and the weighted edge given by the total number of passes going between the pair of nodes, aggregated from all the to matches that team played. Each network is naturally represented as a weighted adjacency matrix of size , where is the number of tiles. We evaluate the Bray-Curtis dissimilarity [Bray and Curtis (1957)

] between teams based on vectorization of adjacency matrices. To ensure that the same physical sample size assumption of Bray-Curtis statistics was met, we scale the cumulative number of passes by the ratio between the mean participating time (

minutes) and team’s actual total participating minutes in the whole tournament. Hence, the Bray-Curtis dissimilarity measure takes into account the higher rate of completed passes as part of the difference between teams.As shown in Figure 2, the Bray-Curtis dissimilarities between team passing networks are evaluated at three scales from coarse to fine. Under relatively coarse spatial resolution, substantial information on finer scales is discarded and team networks tend to behave more similarly to each other. On the contrary, finer spatial resolution is able to preserve high fidelity of the passing network but renders the sharing of statistical strength among multiple networks very difficult. Motivated by our replicated spatial passing network data and the pressing need for appropriately borrowing information across scales and replicates, we develop a Multiresolution Tensor (MrTensor) decomposition approach on a stack of multiresolution adjacency tensors, which can learn coherent coarse-to-fine subnetwork representations from fine-grained relational event data.

### 1.2 Relevant literature

There is an emerging literature on statistical modeling of replicated networks. Much of the literature deals with binary symmetric matrix representations of networks; see, for example, Durante, Dunson and Vogelstein (2017), Durante et al. (2017) and Wang et al. (2017). We instead consider the case in which fine-grained, directed and weighted spatial network data are available and the spatial locations of nodes play a vital role in data organization.

Passes from to can be viewed as dyadic events in product space . These data can potentially be viewed as spatial point patterns, with appropriate continuous process models developed. Modeling point patterns as spatial point processes [Møller and Waagepetersen (2007); Baddeley, Bárány and Schneider (2007)] in continuous space is conceptually simple, but often computationally cumbersome due to intractable integrals. To simplify computation, we instead take a fine-grained discretization of the space based on a multiresolution tiling scheme and focus on the underlying structures driving the global variations across replications, while sacrificing the ability of modeling subtle local variations within each cell of the division. Related discretization procedures were employed by Miller et al. (2014) and Franks et al. (2015) in modeling of NBA shot data.

In this application as well as in many other applications (e.g., brain connectomes), networks are spatially embedded and nodes are non-exchangeable, which hinders the utility of exchangeable graph models [Orbanz and Roy (2015); Caron and Fox (2017)] developed based on Aldous-Hoover or Kallenberg representation theorems. Exploiting the spatial information, a recursive division of the soccer field can naturally induce nested hierarchies within a single network and correspondences of nodes across replicates, that allow us to perform joint multiscale analysis of passing patterns in multiple networks.

In Section 2 we describe a binary encoding scheme and our MrTensor framework. To flexibly characterize the generative mechanism of replicated networks and reduce dimensionality, we postulate passing networks as a weighted combination of low-rank network motifs and introduce a nonnegative tensor decomposition model for multiresolution adjacency tensors in Section 3. In Section 4, we exploit sparsity in the data and propose an efficient optimization algorithm based on block coordinate descent procedures with adaptation of model dimensions. Section 5 presents the results for our analysis of real data.

## 2 Multiresolution Tensor Representation

### 2.1 Tensorial data structure

We divide the standardized rectangle soccer field uniformly into tiles and represent a pass observed in replicate in tensor indices format , with having levels, having levels, and . Comparing against the conventional adjacency matrices representation of network data, this multi-indices representation has the potential advantages of being more compact and informative; (i) it implicitly preserves the network connectivity information by storing only the link observed, (ii) it explicitly expresses the nodal attributes (e.g., tile coordinates), (iii) it is easily expandable to incorporate additional edge properties such as the type of the pass or replicated-level attributes such as the competition outcome. The whole indices list can be conveniently represented as a dimensional contingency table , or in other words, a way count valued tensor, with cells in total. The value in each cell denotes the number of occurrences. In our particular case, , and denotes a pass from origin tile to destination tile conducted by subject . The stacked adjacency matrices representation of multiple networks can be conveniently induced via unfolding the -way tensor of size into a -way tensor of size where multi-indices , , are the indices for the origin tile and destination tile, respectively.

### 2.2 Binary encoding scheme

Passing endeavors can be viewed as hierarchical resource allocation on the field, assigned by teams in possession with the objectives of maneuvering through the defense and creating better chances to score. Teams’ passing selections are arguably influenced heavily by different soccer philosophies of strategic planners at macroscopic spatial resolutions and perturbed by situational circumstances or observation noise at fine spatial resolutions. With this motivation, we model the spatial passing networks in a multiscale manner, with coarse-to-fine representations gradually informed by events on multiple spatial scales.

To access the multiscale occurrence information, we apply a recursive dyadic partitioning scheme uniformly on the soccer field rectangle along both the vertical and horizontal directions. So on each scale, a region is further split into four non-overlapping subregions of the same size. Letting , along each direction the spatial intervals

are treated as categorical variables taking values in

. This recursive dyadic partitioning procedure corresponds to a binary encoding scheme that converts a categorical variable into an bit binary code , more precisely, , . See Figure 3 for an illustrative example, the location of an event in cell is reparameterized as . Accordingly, the event can be located on three increasingly finer and finer scales via binary codes -red region, -green region, -purple region, respectively.We apply this binary encoding scheme to all the first physical modes in the original indices

, which specify the spatial locations of passes. This reparameterization converts the multivariate categorical variables into higher dimensional multivariate binary variables, thus create

auxiliary modes for each of the physical modes in the original tensor . To present it more concisely, in Table 1, we organize the resulting binary codes for spatial indices into a table , where the column vector stores information on scale across all the physical location modes ( with representing the coarsest scale, and representing the finest scale), and the row vector keeps the information in the th physical mode across all scales. Accordingly, the augmented indices list with subject mode can be lodged in a dimensional contingency table , having the same number of cells as .Virtual scale modes | |||||
---|---|---|---|---|---|

Physical location modes |
|||||

This binary reparameterization of multivariate categorical variables allows us to characterize multiscale occurrence of an event. Referring to Figure 4, we denote a pass occurs from tile to tile in the multi-indices format . It is then encoded on scales as with , , and . The binary codes of increasing lengths pertain to information observed on increasingly finer scales: (i) a pass from red rectangle zone to zone on scale , (ii) a pass from green rectangle zone to zone on scale , and (iii) a pass from purple rectangle zone to zone on scale .

Tree-based multiresolution methods are prevalent in signal and image processing [Willsky (2002)]. The binary reparameterization implicitly induces a multiresolution (MR) tree of depth , in which each internal node has children. For each network, the count of the number of occurrences at the leaves of the tree can be organized in a tensor with modes. Consequently, the counts at coarser scales on the MR tree obtained by summing “children” counts can be conveniently found by marginalizing out the tensor modes relevant with finer scales.

The probability of an event

viewed on scale can be represented aswhere can be interpreted as root proportion on the coarsest scale, can be interpreted as coarse-to-fine splitting proportions moving from scales to , .

Related to our work, Kolaczyk (1999)

proposes a recursive dyadic partition tree based Bayesian multiscale model for (discretized) intensity estimation in univariate inhomogeneous Poisson processes. However, the number of parameters grows much faster with scales in multivariate cases. With suitable multilinear structures accompanied with specific mode-wise constraints, our proposed multiresolution tensorial representation can be a more compact and parsimonious alternative to the tree-structured parameterization.

### 2.3 Multiresolution adjacency tensor

Treating each network in an unstructured form corresponds to the traditional operation of vectorization, which flattens the way tensor into a matrix, such that each subject network is represented by a column vector. Unfortunately, this operation throws away the multiscale topological structure and creates huge dimensionality relative to the number of subjects. As a result, the associated matrix factorizations are likely to be poorly estimated. On the contrary, the above binary reparameterization scheme leads to an operation of tensorization, which folds the lower-dimensional tensor (matrix or vector) into a higher-dimensional one.

Interested in the multiscale topological structure of passing networks, we propose the multiresolution adjacency tensor representation of multiple networks in which passing networks on scale are represented in the tensor indices format . With , we transform the way tensor into a way tensor of the size , by mapping the tensor indices as follows,

where , , , , , . The number of cells in does not change during this transformation. For subject , the weighted adjacency matrix on scale can be recovered via matricization [Kolda and Bader (2009)] of the tensor slice. Operating on the multi-indices, tensor element maps to matrix element , where

The elements in these adjacency matrices denote the edge weights. The edge weight on a coarser scale is an aggregation of its “children” edge weights on finer scales.

The idea of tensorization is proposed by Oseledets (2010) and Khoromskij (2011) in the context of quantized tensor networks. Accompanied by various tensor factorization techniques, the effectiveness of tensorization in reducing storage burden and accelerating large-scale computations has been demonstrated with a wide range of successful applications to data compression, computational quantum chemistry and finite element method. Built upon similar ideas of tensorization—“blessing of dimensionality”[Cichocki et al. (2015)], we focus on combating the challenge of high dimensionality and low sample size, and discovering latent structures with natural interpretations by taking advantage of the intrinsic multiway and multiscale structure in the data.

## 3 Poisson Block Term Decomposition Model

The MrTensor data representation framework introduced in Section 2 is compatible with many off-the-shelf tensor decomposition routines and opens the door to other customized probabilistic models. In our applications of interest, data sparsity arises as the primary technical challenge in modeling. For moderate to high-resolution, we end up with massively more cells than the number of observed passes (is ), so the overwhelming majority of the cell counts will be zero. Choosing , the number of cells in is , with of them non-zero (sparsity level: 93.87%). This sparsity issue is very common in analyzing multivariate categorical variables [Zhou et al. (2015)]. To combat this challenge, it is important to take advantage of a multilinear structure to build up the high dimensional tensor object with low-dimensional, and parsimonious matrices. On the other hand, the sparsity in the adjacency tensor also offers us an opportunity to save memory usage and running time, especially in applications with large-scale networks.

### 3.1 Modeling weighted adjacency tensors

Denote the th element of the count valued tensor as , where is a length indices vector. , , correspond to three scales (coarse-to-fine) and and correspond to the origin tile and destination tile, respectively, and is the index for replicates. To represent the intensity of each weighted passing network as a superposition of archetypal network motifs , we propose the following Poisson factorial model for the adjacency tensor,

where , , . is a probability tensor of the same size as , determines the prevalence of motif in passing network , , . Equivalently, this model can be expressed as,

that is, the adjacency tensor of passing network can be randomly partitioned into subnetworks represented by . Each subnetwork is constructed by distributing Poisson number of passes according to the probability tensor shared by all replicates. To ensure the model has greater flexibility in capturing structures and patterns inherent in the data, we set the number of motifs to be large () such that the set of network motifs which represent the passing networks are overcomplete [Lewicki and Sejnowski (2000)]. The degeneracy introduced by over-completeness can be resolved by incorporating additional constraints of sparsity.

### 3.2 Multiscale low-rank network motifs

In order to control the complexity in , one simple assumption is to constrain the probability tensor to be rank-one, i.e., , where denotes the outer product, are probability vectors, , , , . This yields a nonnegative Poisson CANDECOMP/PARAFAC decomposition (Poisson CPD) model [Chi and Kolda (2012)],

jointly applied on multiple adjacency tensors with shared factor matrices . Here is a shorthand notation for . However, the rank-one assumption on could be too restrictive in representing passing network motifs. Figure 5 shows several example motifs that are commonly seen passing combinations in soccer but are clearly not rank-one.

We relax this constraint by allowing to be low rank with the canonical polyadic decomposition structure,

Both and are constrained to be probability vectors. Each is a convex combination of rank-one components being consonant with the multiresolution network topological structures. To see this, denoting the two coarser representations of on scale and as and , we have

so the components are consistent across scales, , , and gradually adding more and more details to the representations on coarser scale through outer multiplication. This ensures our model finds coherent coarse-to-fine representations of low-rank motifs, which can serve as basic building blocks for secondary inference tasks such as team comparison and outcome prediction. Meanwhile, , are the feature matrices for the sender nodes and receiver nodes in each partitioned network on scale , where denotes the Khatri-Rao product.

In tensor notation, this model can be summarized as,

(3.1) | ||||

(3.2) |

All the parameters in equation (3.2) are constrained to be non-negative. We term the model as Poisson nonnegative CP-Block Term Decompositions (Poisson CP-BTD). The block term decomposition (BTD) [De Lathauwer (2008); De Lathauwer and Nion (2008)] refers to the decomposition of the higher-order tensor into a sum of rank block terms,

where denotes the mode- tensor-matrix product and denotes a diagonal tensor. The diagonal entry determines the excitation of template in motif . Our model can be viewed as a probabilistic extension of BTD in taking account of higher-order sparse count tensors. The nonnegative constraints allow for non-subtractive (part-based) representations of the network with natural interpretations [Lee and Seung (1999); Shashua and Hazan (2005)]. The notion of linear rank is therefore generalized to nonnegative rank [Cohen and Rothblum (1993)], so can be larger than the original data dimension.

## 4 Block Coordinate Descent Algorithm

In Section 3

we proposed a Poisson CP-BTD model for the multiresolution adjacency matrices. The dependency structure of the underlying intensity parameter is captured by the CP-BTD model and the random variations of the individual count, is described by the Poisson distribution. Maximizing the Poisson log-likelihood is equivalent to minimizing the (generalized) Kullback-Leibler (KL) divergence up to an additive constant,

(4.1) |

subject to the multilinear constraint on the underlying intensity parameters,

In order to remove scaling ambiguities, we impose both and to be probability vectors, , , , and , ,

. The maximum likelihood solution for this model can be found by an expectation maximization (EM) algorithm, detailed in Appendix

A. This algorithm has high consumption of memory as it requires storage of a intermediate matrix in the E-step of every iteration, where is the number of nonzero cells in . Alternatively, we develop a block nonlinear Gauss-Seidel (GS) algorithm [Grippo and Sciandrone (2000); Kim, He and Park (2014); Hansen, Plantenga and Kolda (2015)] for the Poisson CP-BTD model. In parallel with the alternating least square procedures in the BTD model which minimizes the Frobenius norm [De Lathauwer and Nion (2008)], the KL divergence minimization problem in Poisson BTD boils down to alternating Poisson regression (APR) [Chi and Kolda (2012)] steps. The algorithm is convergent with lower per-iteration cost and much greater memory efficiency.### 4.1 Nonlinear Gauss-Seidel method

Our optimization problem is defined as

(4.2) | ||||

(4.3) | ||||

(4.4) |

where

We solve problem (4.4) via an alternating approach between updating the factor score matrix and the mode-wise factor loading matrices composing the network motifs .

#### 4.1.1 Updating the factor score matrix

We define to be an matrix composed of the direct sum of column vectors . Specifically, , where is the direct sum, . The matrix representation of the network motif can be written as

(4.5) |

with each row a probability vector which corresponds to a motif. The matricization of the -way tensor along its last mode results in a two-dimensional matrix . The optimization problem can be written as

where is the vector of all ones, and is the Hadamard product between matrices. We further reduce memory usage and accelerate computation. First, note that most of the elements in matrix are zero, storing it as a sparse matrix in the indices format only requires memory, with the number of nonzero elements. Second, given , the objective function is separable with respect to the columns of , i.e., ; therefore, the columns of can be updated simultaneously. Third, denoting the subsets of indices , and , we have

(4.6) | ||||

(4.7) |

In the first RHS term of the equation (4.7), we have due to the simplex constraint on the rows of . Therefore, we only need to compute and store a submatrix of in which the columns correspond to nonzero elements in the vector , that is,

(4.8) |

The computations of equation (4.8) based on the Hadamond product of matrices are much cheaper than those of equation (4.5) based on the Khatri-Rao product. Minimizing

can be viewed as finding the maximum likelihood solution of a Poisson linear regression problem with identity link,

is a matrix, is a count-valued vector, and is the nonnegative regression coefficients. This problem is convex and the solver to this problem is introduced later in Section 4.2.#### 4.1.2 Updating the mode-wise factor loading matrices

Similarly, we unfold the -way tensor along its -th mode, which results in a two-dimensional matrix , . Letting the matrix with row sum , for convenience of computation later, we set , such that every column of is a probability vector. Again the corresponding covariate matrix can be written as using Khatri-Rao product, the optimization objective function is

(4.9) |

such that

However, the feasible set of the optimization problem in equation (4.9) is no longer convex, due to the norm equality constraint. Following Hansen, Plantenga and Kolda (2015), we set , and rewrite the objective function in equation (4.9) as

which is convex with respect to . After finding , we set and to ensure the simplex constraints on the columns of are satisfied. In addition, we let . This rescaling operation is also adopted by Chi and Kolda (2012) in their CP-APR algorithm.

Second, given , the objective function is also separable with respect to the rows of . Letting , denoting the subsets of indices , and , , we have

(4.10) | ||||

(4.11) |

In the first RHS term of equation (4.11), we have due to the simplex constraint on the rows of . Therefore, for each subproblem we only need to compute and store a submatrix of in which the columns correspond to nonzero elements in the vector , which can be calculated via

(4.12) |

Similarly, minimizing can also be viewed as finding the maximum likelihood solution of a Poisson linear regression problem with identity link, in which is a matrix, is a count vector, is the nonnegative regression coefficients.

The block nonlinear GS algorithm for maximum likelihood estimation of our Poisson CP-BTD model is summarized in Algorithm 1. The algorithm iterates between updating the tensor loading factor matrices and the factor usage; both steps boil down to a number of convex optimization subproblems. Additional regularizers can be added to promote special properties, such as sparsity or group-sparsity, but the resulting penalized maximum likelihood problem might not be convex. In Section 4.3 we propose a solver for sparse Poisson regression problems based on a Minorize-Maximization (MM) algorithm [Hunter and Lange (2004)], which iteratively operates on local convex surrogates and reaches a local optimum.

### 4.2 Poisson regression with identity link

In our Poisson CP-BTD model, the subproblems arising from the nonlinear GS procedures in Section 4.1 take the form of minimizing the negative log-likelihood of a special form of Poisson linear regression problem with column stochasticity constraints on the covariate matrix. The choice of Poisson model has considerable computational benefits over the Gaussian. To see this, denoting the observations , the covariate matrix , , and the non-zero subset , the corresponding covariate submatrix , are the unknown nonnegative regression coefficients, the objective function is written as,

Comments

There are no comments yet.