# A Schur transform for spatial stochastic processes

The variance, higher order moments, covariance, and joint moments or cumulants are shown to be special cases of a certain tensor in V^⊗ n defined in terms of a collection X_1,...,X_n of V-valued random variables, for an appropriate finite-dimensional real vector space V. A statistical transform is proposed from such collections--finite spatial stochastic processes--to numerical tuples using the Schur-Weyl decomposition of V^⊗ n. It is analogous to the Fourier transform, replacing the periodicity group Z, R, or U(1) with the permutation group S_n. As a test case, we apply the transform to one of the datasets used for benchmarking the Continuous Registration Challenge, the thoracic 4D Computed Tomography (CT) scans from the M.D. Anderson Cancer Center available for download from DIR-Lab. Further applications to morphometry and statistical shape analysis are suggested.

## Authors

• 2 publications
10/25/2018

### Signature moments to characterize laws of stochastic processes

The normalized sequence of moments characterizes the law of any finite-d...
04/13/2021

### Fourier and Zak transforms of multiplicative characters

In this paper we derive formulas for the N-point discrete Fourier transf...
11/12/2021

### A new class of spatial covariance functions generated by higher-order kernels

Covariance functions and variograms play a fundamental role in explorato...
03/30/2018

### An efficient high dimensional quantum Schur transform

The Schur transform is a unitary operator that block diagonalizes the ac...
08/16/2008

### Higher Order Moments Generation by Mellin Transform for Compound Models of Clutter

The compound models of clutter statistics are found suitable to describe...
08/18/2019

### Signature Cumulants, Ordered Partitions, and Independence of Stochastic Processes

The sequence of so-called signature moments describes the laws of many s...
06/24/2020

### A Higher Order Unscented Transform

We develop a new approach for estimating the expected values of nonlinea...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In engineering and the sciences there are many situations in which one measures the locations of points which are matched across a group of different objects. For example, anatomical landmarks in multi-atlas medical images, skeletal landmarks of specimens in the fossil record, or a time series of several point-motion trackers on a moving subject. What are the natural features of such data? Such features could be used to train predictive models, to feed back into control systems, or to simply re-present the data so that an investigator can discern patterns.

The geometry and dynamics of such data are normally quantified by measures like volumetric deformation, deformation tensor eigenvalues, kinetic energy, total displacement along paths of travel, path curvature, and the summary statistics of such quantities, as well as more inherently statistical measures like higher-order joint moments.

This article proposes a class of features arising from classical tensor algebra and representation theory that are natural generalizations of elementary statistics, like the covariance or joint moments or cumulants, and that are substantially coordinate-independent. We interpret the given data in the context of probability theory by positing that the locations of the

points on

objects are samples from the joint distribution of

different random spatial or vector variables valued in a vector space . The expected value of the tensor product of these variables is a tensor in the classical sense, an element of the tensor product space . It is at this point that the multivariate case shows itself to be much richer and much more challenging than the univariate case: The tensor product of 1-dimensional vector spaces is always another 1-dimensional vector space, so that tensor language can be entirely avoided in the theory of univariate statistics, while the -fold tensor product of a -dimensional vector space for is literally exponentially larger, of dimension .

There is an additive decomposition of elements of called the Schur-Weyl decomposition (not to be confused with the multiplicative Schur decomposition of a matrix). The Schur-Weyl decomposition arises from the consideration of the possible symmetry types with respect to permutations of the

factors. Or, what turns out to be equivalent, the possible symmetry types with respect to the linear transformations of

, the elements of the general linear group

. These symmetry types in some sense interpolate between the fully-symmetric type (as in symmetric matrices) and fully-antisymmetric type (as in anti-symmetric matrices). The new features that we emphasize are the norms of the components of this decomposition, which we call Schur amplitudes, thought of as invariants for a restricted, orthogonal group. One could also consider any other invariants of these components.

We will recall the part of the classical representation theory of and the permutation group that is required for the actual numerical calculation of the proposed features. As it turns out, the formula for the components of the Schur-Weyl decomposition is a convolution of exactly the same sort as the Fourier transform. The main difference is that the convolution kernel for the Fourier transform, interpreted as the character functions for the irreducible representations of the unitary group or the additive groups or , is replaced with a convolution kernel given by the character functions for the irreducible representations of the symmetric group , also known as the permutation group on symbols.

After explaining the algorithm in detail, we present the results of the transform applied to 4DCT lung motion data as a test case. This very preliminary example is used to give possible interpretations of the Schur transform in applications.

## 2 Theory

Let be a finite-dimensional real vector space and let be -valued random variables on a fixed probability space . We assume that the are jointly continuous with joint probability density on . We use the notation for the symmetric power of .

The following definition is motivated by the need for a uniform treatment of the types of quantities that will be mentioned in Table 1.

###### Definition 2.1.

The covariance tensor, denoted , is

 Cov(X1,...,Xn):=∫V×⋯×V(v1−μ1)⊗⋯⊗(vn−μn)d(v1,...,vn)∈V⊗n

where .

The covariance tensor of type , denoted , where each is a positive integer, is

 Cov(l1,...,ln)(X1,...,Xn):= Cov(X1,...,X1l1,...,Xn,...,Xnln)∈Sl1V⊗⋯⊗SlnV⊂V⊗∑nili

The covariance tensor is still defined, by the same formula, if each is valued in an affine space for the vector space rather than itself. This is because in this case, each is still defined as an element of since . Also, the definition of the covariance tensor does not require a choice of inner product, a basis, or any other additional structure on .

In some situations one has meaningful origin or reference points for each variable which are not necessarily the means . In this case it might be desirable to replace the means in Definition 2.1 with these reference points. The resulting tensor may be called non-central, by analogy with the non-central ordinary moments.

Several special cases of the covariance tensor are summarized in Table 1 and described as follows.

If , there is a canonical isomorphism . If in addition , is the ordinary univariate covariance . On the other hand, for , is the variance . , , etc. are the higher moments of . If , is the joint moment of the .

If and , the matrix of with respect to a basis for is the covariance matrix of and , the matrix .

If and , the matrix of with respect to a basis for is the variance matrix of a single vector-valued variable . This is also called the inertia tensor of , especially when .

For general we shall call , , , etc. the spatial higher moments of in , , , etc. We have singled out the cases and because in both cases there is a classical numerical invariant, known as the -invariant, which captures essential features of the geometry of the tensor and which is well-defined under arbitrary affine transformations of or respectively. For an element regarded as a homogeneous quartic polynomial in 2 variables, the -invariant describes the set of the 4 complex roots of in up to projective transformations. For an element regarded as a homogeneous cubic polynomial in 3 variables, the -invariant describes the algebraic isomorphism type of the elliptic curve described by in the projective plane .

In general and are not expressible in terms of matrix algebra operations like sum, product, transpose, inverse, and trace, and they are not among the familiar objects from probability theory or statistics (see e.g. [9] or [8]). Certain cases of this general covariance tensor are discussed in [5] in the fully tensorial context, but no systematic description by means of symmetries is attempted.

Let denote an arbitrary positive integer partition of the integer (e.g. for ). Recall that there is a one-to-one correspondence between the set of (also described by Young diagrams with boxes or by conjugacy classes in ) and the isomorphism classes of irreducible representations of the permutation group ([3] page 44). Thus will also be used to denote such an isomorphism class, and will denote an representation in the isomorphism class . denotes the character function of associated with , that is is the trace of the linear map by which the element acts. Similarly there is a one-to-one correspondence between the set of and certain functors , called Schur functors, where is a certain representation of ([3] page 75).

Each element acts as a linear transformation of to permute the tensor factors. The behavior of the elements of

under such transformations can be used to classify and decompose such elements. Frequently, a representation

of a group has a unique decomposition as a direct sum of -irreducible subrepresentations, but one is not always so lucky. As a representation of , does not have a unique decomposition as a direct sum of irreducible subrepresentations as soon as . The same is true with respect to the group rather than . However, the following classical theorem provides an alternative. For this theorem only, we assume that is a finite-dimensional complex vector space.

###### Theorem 2.2.
1. The irreducible decomposition of as a representation of is unique, written , where the are the irreducible subrepresentations and ranges over the positive integer partitions of .

2. is the sum of all -subrepresentations of isomorphic to .

3. as representations.

4. The projection is

 πλ:t↦χλ(identity)n!∑σ∈Snχλ(σ−1)σ(t)

where denotes the character function of and denotes the action of on .

Theorem 2.2 records just a small part of a large general theory, the part which is needed here. Theorem 2.2 goes all the way back to Issai Schur’s 1901 dissertation [6] and his later article [7]. Hermann Weyl’s 1939 book [10], chapter IV, is a well-known exposition in English. See [1] page 81 for discussion and further references.

(Dimensions

). It is no trivial matter to establish the number of degrees of freedom for each component of the decomposition in Theorem

2.2, the dimensions of the . Nevertheless these dimensions are classically known. The dimension of is provided by the hook-length formula in terms of the Young diagram of type . The dimension of is given by the value of character function for on the identity element of , expressed in terms of the eigenvalues of an element by the Schur polynomial. Then .

Surprisingly, it follows from the Frobenius formula ([3] page 49) that the values of the character functions for are integers. This implies that the decomposition exists for real , so we did not need to pass to the complexification after all. It also implies that in principle the projection by can be performed exactly by a computer, without special commutative ring calculations and without floating-point approximation.

The projection formula in Theorem 2.2 part 4 is essentially the formula for the Fourier transform, as clarified in Table 3.

We are now ready to define our main object of study, the Schur transform and some of its variants.

###### Definition 2.3.

Let , for -valued random vectors .

1. The Schur transform of is the tuple of components in the Schur-Weyl decomposition of , .

2. Fix an inner product on and consider the induced inner product and norm on . The Schur amplitudes of are the norms .

3. Suppose that random vectors valued in are given, for . The -factor Schur content is the set (distribution) of the Schur amplitudes of all -element subsets of the .

4. Suppose that random vectors valued in are given, for , as above. The sequential -factor Schur content is the set (distribution) of the Schur amplitudes of all consecutive -element subsets of the .

Notice that the Schur amplitudes are independent of the order of the variables , so that the -factor Schur content is well-defined. Also, if all are equal to some fixed , the Schur transform has only one non-zero entry, which is equal to .

Each component of the Schur transform has its own interpretation as an encoding of some aspect of the geometric arrangement of the . For example, the component quantifies the tendency of the values of the , as displacements from the means of the , to “fill space” by circumscribing an -dimensional volume. A straightforward geometric interpretation of the vast majority of the other components remains to be discovered.

(Higher joint moments). The covariance tensor (without qualification) is also the covariance tensor of trivial type , where each . If is small, the covariance tensor for non-trivial may be more interesting. In this case, the classical umbral calculus furnishes invariant functions of which may play a role similar to that of the Schur amplitudes. An explicit algorithm for such invariants is provided in [4]. Such invariants are a generalization of the -invariants described in the special case of elements of or .

## 3 Algorithm

In practice, information about the joint distribution of random variables is given by samples , presumed to comprise an independent and identically distributed (i.i.d) sample from of size .

###### Definition 3.1.

The sample covariance tensor is

 T=ˆCov(X1,...,Xn):=N∑j=1(vj1−¯v1)⊗⋯⊗(vjn−¯vn)∈V⊗n

where is the sample mean .

The sample Schur transform or discrete Schur transform is the tuple of components in the Schur-Weyl decomposition of .

This section provides the details of the numerical computation of the discrete Schur transform. The author’s implementation is available at http://github.com/schur-transform .

Precomputation

1. Fix a maximium size for the length of the input data series , , and dimension for the data points, so that belongs to .

2. Compute the character tables of for each .

3. Compute, for each , the matrix of permutation of tensor factors of with respect to the basis , where is the standard basis of . Use lexicographical order of the index tuples for the basis ordering. For example, for the permutation , the row corresponding to the input basis element will have exactly 1 non-zero entry, the value 1 in the column corresponding to output basis element .

4. For each conjugacy class of , compute the matrix which is the sum given by:

 S(c):=∑σ∈cP(σ)
5. For each character , i.e. each row of the character table of , compute the projection matrix which is the sum over conjugacy classes given by:

 π(λ):=∑cχλ(identity)χλ(c)n!S(c)
6. Verify that

is the identity matrix. Since the entries of

are integers, this equation should hold exactly.

Main computation

1. A data series is input, where is the spatial dimension ranging from to , is the sample dimension ranging from to , and is the series dimension (e.g. time) ranging from to .

2. Compute the means , and replace with .

3. Compute the sample covariance tensor, a column vector of size :

 T:=N∑j=1k∑α1,...,αn=1(n∏i=1vjiαi)eα1⊗⋯⊗eαn
4. Compute the Schur transform, one tensor for each character , given by the matrix products:

 T(λ)=π(λ)⋅T
5. Verify that .

6. Compute the Schur amplitude , for each , as the square root of the sum of the squares of the numerical entries of (the Frobenius norm).

Note that in precomputation step 5, it is not necessary to use the inverse as in the formula of Theorem 2.2. In , the conjugacy class of an element is the permutation cycle type, which is the same for and its inverse . Thus .

(Feasibility). The algorithm presented above for the precomputation steps is not feasible for large because it requires iteration over the elements of . A more efficient algorithm could be obtained from a closed formula for the sum over a given conjugacy class which does not require iteration over the members of . However, even modestly large values of present a memory problem for the main computation steps, since the sample covariance tensor has numerical components and each projector has components. For example, for and , each projector uses over 10 GB of data storage. On the other hand, many of the Schur components vanish entirely for certain dimensions of and the corresponding projectors do not need to be calculated (e.g. if ).

## 4 Applications

The Schur transform, or Schur content more generally, clearly has potential applications to morphometry, statistical shape analysis, fluid motion statistics, or body motion or gesture tracking.

(Exploratory classification and validation). The -factor Schur content summarizes the geometry of variation across a group of spatial variables (). With increasing , the resolution or complexity of the summary increases. Certain Schur components may be found to discriminate well between groups under different conditions, or certain patterns of Schur amplitudes may be found to characterize commonly occurring types of within-group variation.

(Classification rule). Given a prior stratification of the matched objects of a given data set into classes, a possible classification rule for an additional st object is as follows. For each class, evaluate both the -factor Schur content and a modified -factor Schur content in which the -fold subsets are replaced by the union of -fold subsets with the additional sample. and are both tuples of distributions, one for each -partition type . Select the class which minimizes, for example, the or difference between the means of and the means of .

## References

• [1] Andrew Berget. Symmetries of Tensors. PhD thesis, University of Minnesota, 2009.
• [2] Richard Castillo, Edward Castillo, Rudy Guerra, Valen Johnson, Travis McPhail, Amit Garg, and Thomas Guerrero. A framework for evaluation of deformable image registration spatial accuracy using large landmark point sets. Physics in medicine and biology, 54:1849–70, 04 2009.
• [3] William Fulton and Joe Harris. Representation theory, volume 129 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1991. A first course, Readings in Mathematics.
• [4] Frank D. Grosshans, Gian-Carlo Rota, and Joel A. Stein. Invariant theory and superalgebras, volume 69 of CBMS Regional Conference Series in Mathematics. Published for the Conference Board of the Mathematical Sciences, Washington, DC; by the American Mathematical Society, Providence, RI, 1987.
• [5] Peter McCullagh. Tensor methods in statistics. Monographs on Statistics and Applied Probability. Chapman & Hall, London, 1987.
• [6] I. Schur. Ueber eine Klasse von Matrizen, die sich einer gegebenen Matrix zuordnen lassen. Dieterich in Göttingen, 1901.
• [7] I. Schur. Über die rationalen Darstellungen der allgemeinen linearen Gruppe. Sitzungsberichte Akad, 1927.
• [8] Howard G. Tucker. A graduate course in probability. Probability and Mathematical Statistics, Vol. 2. Academic Press, Inc., New York-London, 1967.
• [9] Larry Wasserman. All of statistics. Springer Texts in Statistics. Springer-Verlag, New York, 2004. A concise course in statistical inference.
• [10] H. Weyl. The Classical Groups: Their Invariants and Representations. Princeton University Press, 1939.