High order cumulant tensors and algorithm for their calculation

In this paper, we introduce a novel algorithm for calculating arbitrary order cumulants of multidimensional data. Since the m^th order cumulant can be presented in the form of an m-dimensional tensor, the algorithm is presented using tensor operations. The algorithm provided in the paper takes advantage of super-symmetry of cumulant and moment tensors. We show that the proposed algorithm considerably reduces the computational complexity and the computational memory requirement of cumulant calculation as compared with existing algorithms. For the sizes of interest, the reduction is of the order of m! compared to the naive algorithm.


page 1

page 2

page 3

page 4


Best Rank-One Tensor Approximation and Parallel Update Algorithm for CPD

A novel algorithm is proposed for CANDECOMP/PARAFAC tensor decomposition...

Effective Tensor Sketching via Sparsification

In this paper, we investigate effective sketching schemes via sparsifica...

On updates of high order cumulant tensors

High order cumulants carry information about statistics of non--normally...

RotEqNet: Rotation-Equivariant Network for Fluid Systems with Symmetric High-Order Tensors

In the recent application of scientific modeling, machine learning model...

Automating Personnel Rostering by Learning Constraints Using Tensors

Many problems in operations research require that constraints be specifi...

High Performance Rearrangement and Multiplication Routines for Sparse Tensor Arithmetic

Researchers are increasingly incorporating numeric high-order data, i.e....

Efficient Learning of Mixed Membership Models

We present an efficient algorithm for learning mixed membership models w...

Code Repositories


Calculates multivariate cumulants of any order

view repo


In this paper, we introduce a novel algorithm for calculating arbitrary order cumulants of multidimensional data. Since the th order cumulant can be presented in the form of an -dimensional tensor, the algorithm is presented using tensor operations. The algorithm provided in the paper takes advantage of super-symmetry of cumulant and moment tensors. We show that the proposed algorithm considerably reduces the computational complexity and the computational memory requirement of cumulant calculation as compared with existing algorithms. For the sizes of interest, the reduction is of the order of compared to the naïve algorithm.

igh order cumulants, non-normally distributed data, numerical algorithms

65Y05, 15A69, 65C60

1 Introduction

1.1 Normally and non-normally distributed data

Let us consider the

-dimensional normally distributed random variable

where is a positive-definite covariance matrix and

is a mean value vector. In this case characteristic function

and cumulant generating function  [30, 35] are


It is easy to see that is quadratic in , and therefore its third and higher derivatives with respect to [30, 35] are zero. If data is characterised by a frequency distribution other than the multivariate normal distribution, the characteristic function may be expanded in more terms than quadratic, and cumulants of the order higher than two may have non-zero elements. This is why they are helpful in distinguishing between normally and non-normally distributed data or between data from different non-normal distributions.

1.2 Motivation

Cumulants of the order of have recently started to play an important role in the analysis of non-normally distributed multivariate data. Some potential applications of higher-order cumulants include signal filtering problems where the normality assumption is not required (see [25, 32] and references therein). Another application is finding the direction of received signals [45, 41, 10, 33] and signal auto-correlation analysis [37]. Higher-order cumulants are used in hyper-spectral image analysis [26], financial data analysis [2, 29] and neuroimage analysis [9, 5]. Outside the realm of signal analysis, higher order cumulants may be applied for quantum noise investigation purposes [24], as well as to other types of non-normally distributed data, such as weather data [15, 19, 43], various medical data [46], cosmological data [50]

or data generated for machine learning purposes


In the examples mentioned above only cumulants of the order of

were used due to growing computational complexity and large estimation errors of high order statistics. The computational complexity and the use of computational resources increases considerably with the cumulants’ order by a factor of

, where is the order of the cumulant and is the number of marginal variables.

Despite the foregoing, cumulants of the order of of multivariate data were successfully used in high-resolution direction-finding methods of multi-source signals (the q-MUSIC algorithm) [12, 11, 13, 34]

despite higher variance of the statistic’s estimation. In such an algorithm, the number of signal sources that can be detected is proportional to the cumulant’s order

[44]. Cumulants of the order of also play an important role in financial data analyses, as they enable measurement of the risk related to portfolios composed of many assets [48, 38]. This is particularly important during an economic crisis, since higher order cumulants make it possible to sample larger fluctuation of prices [42]. In [48], cumulants of the order of 2–6 of multi-asset portfolios were used as a measure of risk seeking vs. risk aversion. In [38], it was shown that, during an economic crisis, cumulants of the order of are important to analyse variations of assets and prices of portfolios. Further arguments for the utility of cumulants of the order of in financial data analysis can be find in [28, 1] and [18] where cumulant tensors of the order of 2–6 were used to analyse portfolios during an economic crisis. Finally, let us consider the QCD (Quantum Chromodynamics) phase structure research area. In [23], the authors have evidenced the relevance of cumulants of the order of and

of net baryon number fluctuations for the analysis of freeze-out and critical conditions in heavy ion collisions. Standard errors of those cumulant estimations were discussed in


In our study, we introduce an efficient method to calculate higher-order cumulants. This method takes advantage of the recursive relation between cumulants and moments as well as their super-symmetric structure. Those features allow us to reduce the computational complexity of our algorithm and make the problem tractable. In order to reduce complexity, we use the idea introduced in [49] to decrease the storage and computational requirements by a factor of .

This allows us to handle large data sets and overcome a major problem in numerical handling of high order moments and cumulants. Consider that the estimation error of the one-dimensional th central moment is limited from above by where is the th central moment and is number of data samples. This is discussed further in Appendix A. Consequently, the accurate estimation of statistics of the order of requires correspondingly large data sets. Our approach allows us to handle in practice cumulants up to the tenth order.

1.3 Basic definitions

Let us start with a random process generating discrete dimensional values. A sequence of samples of an dimensional random variable is represented in the form of matrix such that


This matrix can be represented as a sequence of vectors of realisations of marginal variables




In order to study moments and cumulants of , we need the notion of super-symmetric tensors. Let us first denote the set as , a permutation of tuple as .

Definition 1

Let be a tensor with elements indexed by multi-index . Tensor is super-symmetric iff it is invariant under any permutation of the multi-index, i.e.


Henceforth we will write for super-symmetric tensor . A list of all notations used in this paper is provided in Table 1.

Definition 2

Let be as in Eq. (2). We define the th moment as tensor . Its elements are indexed by multi-index and equal


where is the expectational value operator and a vector of realisations of the th marginal variable.

Definition 3

Let be as in Eq. (2). We define centred variable as


Symbol Description/explanation
element multi-index
size of multi-index (number of elements)
permutation of multi-index
set of integers
matrix of realisations of dimensional random variable
vector of realisations of the th marginal random variable
expectational value operator
super-symmetric mode tensor of size , with elements
mode tensor of sizes , with elements
matrix of realisations of dimensional centred random variable
the th cumulant tensor of with elements
, block of the th cumulant or moment tensor indexed by in the block structure.
the th central moment tensor of with elements
the th moment of one dimensional
Table 1: Symbols used in the paper.

The first two cumulants respectively correspond to the mean vector and the symmetric covariance matrix of . For the purpose of the paper, we define the first four cumulants as follows [30, 35]

Definition 4

We define the first cumulant as


Definition 5

We define the second cumulant as


Definition 6

We define the third cumulant as a three-mode tensor with elements


Definition 7

We define the fourth cumulant as a four-mode tensor with elements


Remark 1

Each cumulant tensor as well as each moment tensor is super-symmetric [4].

As the formula for a cumulant of an arbitrary order is very complex, its numerical handling is our core result. As such, it will be discussed in depth in Section 3.

2 Moment tensor calculation

To provide a simpler example, we start with algorithms for calculation of the moment tensor. Next, in Section 3, those algorithms will be utilised to recursively calculate the cumulants.

2.1 Storage of super-symmetric tensors in block structures

In this section, we are going to follow the idea introduced by Schatz et al. [49] concerning the use of blocks to store symmetric matrices and super-symmetric tensors in an efficient way. To make the demonstration more accessible, we will first focus on the matrix case. Let us suppose we have symmetric matrix . We can store the matrix in blocks and store only upper triangular blocks,


where NULL represents an empty block, and . Entries below the diagonal do not need to be stored and calculated as they are redundant. Each block is of size . Blocks are of size , and block is of size , where:


This representation significantly reduces the overall storage footprint while still providing opportunities to achieve high computational performance.

This representation can easily be extended for purposes of super-symmetric tensors. Let us assume that is a super-symmetric tensor. All data can be stored in blocks . If indices are not sorted in an increasing order, such blocks are redundant and consequently replaced by NULL. Similarly to the matrix case we have


In the subsequent sections we present algorithms for moment and cumulant tensor calculation and storage. For simplicity, we assume that and . The generalization is straightforward and, at this point, would only obscure the main idea.

Henceforth each block is a hypercube of size and there are such unique blocks [49]. Such storage scheme, proposed in [49], requires the storage of elements.

2.2 The algorithm

In this and following sections, we present the moment and cumulant calculation algorithms that use the block structure. To compute the th moment tensor we use Def. 2. Algorithm 1 computes a single block of the tensor, while Algorithm 2 computes the whole tensor in the block structure form.

1:Input: — data matrix; — multi-index of a block; — block’s size.
2:Output: — a single block of the block structure.
3:function momentblock()
4:     for  do
6:     end for
7:     return
8:end function
Algorithm 1 A single block of central moment, used to perform Algorithm 2
1:Input: — data matrix; — moment’s order; — block’s size.
2:Output: th moment tensor stored as the block structure.
3:function moment()
5:     for  do
7:     end for
8:     return
9:end function
Algorithm 2 The th moment tensor stored as a block structure

Based on [49] and the discussion in the previous subsection, we can conclude that reduction of redundant blocks reduces the storage and computational requirements of the th moment tensor by a factor of for compared to the naïve algorithm. The detailed analysis of the computational requirements will be presented in Section 5.

2.3 Parallel computation of moment tensor

For large , it is desirable to speedup the moment tensor calculation further. This can be achieved via a simple parallel scheme. Let us suppose for the sake of simplicity, that we have processes available, and . Starting with data we can split them into non overlapping subsets . In the first step, for each subset, we compute in parallel moment tensor using Algorithm 2. In the second step, we perform the following reduction


The elements of the tensor under the sum on the RHS are


The element of the moment tensor of is


These steps are summarised in Algorithm 3.

1:Input: — data matrix; — moment’s order; — block’s size; — number of processes.
2:Output: th moment tensor stored as the block structure.
3:function momentnc()
5:     for  do perform in parallel
7:     end for
8:      reduce scheme
9:     return
10:end function
Algorithm 3 Parallel computation of the th moment tensor

3 Calculation of cumulant tensors

At this point, we can define our main result, i.e. an algorithm for calculating cumulants of arbitrary order of multi-dimensional data.

3.1 Index partitions and permutations

In this section, we present a recursive formula that can be used to calculate the th cumulant of . We begin with some definitions, mainly concerning combinatorics, before discussing the general formula.

Definition 8

Let , and . Partition of tuple is the division of into non-crossing sub-tuples: ,


In what follows, we will denote the permutations of a tuple of tuples , as .

Definition 9

— the representative of the equivalence class of partitions. Let and be partitions of . Let us introduce the following equivalence relation:


This relation defines the equivalence class. Henceforth we will take only one representative of each equivalence class and denote it as . The representative will be such that all are sorted in an increasing order. We will denote a set of all such equivalence classes as .

Remark 2

The number of partitions of set of size into parts is given by the Stirling Number of the second kind, [27]


Definition 10

Consider tensors , indexed by and respectively. Their outer product is defined as


where denotes multi-index .

As an example consider the outer product of symmetric matrix by itself: , that is only partially symmetric, , but in general . To obtain a super-symmetric outcome of the outer product of super-symmetric tensors, we need to apply the following symmetrisation procedure.

Definition 11

The sum of outer products of super-symmetric tensors. Let be a tensor indexed by . Let be a sub-tuple of its modes according to Def. 9, and let . For the given , we define the sum of outer products of where and , using the elementwise notation, as


We will use the following abbreviation using tensor notation


Consider as in Eq. (23) where are super-symmetric and iff and is a multi-index of . The sum over all representatives of equivalence classes fully symmetrises the outer product, and therefore is super-symmetric. In other words, due to the super-symmetry, any permutation of multi–index of that leads only to a permutation of indices inside some refers to the same value of . Any permutation of that leads only to the switch between and inside an outer product in Eq. (23) also refers to the same value of . Any other permutation of that cannot be represented as above switches between equivalence classes as well, and so it switches between elements of sum Eq. (23) and refers to the same value of .

Example 1

Consider , and such that




such is super-symmetric, since there is no permutation of that changes its elements, i.e.  .

3.2 Cumulant calculation formula

The following recursive relation can be used to relate moments and cumulants of :


This can be written in an elementwise manner as in [4]


For the sake of completeness, we present an alternative proof of Eq. (27) in Appendix B.

In order to compute , let us consider the case where separately. By definition, , so:


The th cumulant tensor can be calculated given the th moment tensor and cumulant tensors of the order of


To simplify Eq. (29), let us observe that cumulants of the order of two or higher for a non-centred variable and a centred variable are equal. The first order cumulant for a centred variable is zero. Hereafter, we introduce partitions into sub-tuples of size larger than one.

Definition 12

Let , and . The at least two element partition of tuple is the division of into sub-tuples: , such that


The definition of the representative of equivalence class and the set of such representatives are analogical to Def. 9. Consequently, we can derive the final formula


Let us determine the limit. If is even, it can be divided into at most parts of size two; if

is odd, it can be divided into at most

parts: parts of size two and one part of size three. Hence we can conclude that .

As a simple example, consider the cumulants of the order of three and four. Since , then and , i.e. the second cumulant matrix and the third cumulant tensor are simply the second and the third central moments. Formulas for cumulant tensors of the order grater than three are more complicated. For example, consider the th cumulant tensor


Using the elementwise notation, where , we have


3.3 Algorithms to compute cumulant tensors

Let us suppose that is the th element of the th block of the super–symmetric tensor of the order of . Similarly, is the th element of the th block of the th cumulant tensor according to Def. 11—we skip now in for brevity. With reference to Def. 11 and Def. 9, is always sorted and from the properties of the block structure is also sorted, hence is sorted as well. To determine we use modified Knuth’s algorithm [31]. Now we have all the components to introduce Algorithm 4 which computes a super-symmetric sum of outer products of lower order cumulants. Algorithm 4 computes the inner sum of Eq. (31) and takes advantages of the super-symmetry of tensors by using the block structure.

Finally, Algorithm 5 computes the th cumulant tensor. It uses Eq. (31) to calculate the cumulants and importantly takes advantage of the super-symmetry of tensors, because it refers to Algorithm 2 (moment tensor calculation) and Algorithm 4 that both use the block structure.

1:Input — order of output; — number of subsets for partitions; — lower cumulants stored as a block structure.
2:Output – sum of outer products stored as the block structure.
3:function outerpodcum() , — block size
4:     for  do
5:         for  do
7:         end for
8:     end for
9:     return
10:end function
Algorithm 4 Sum of outer products of the cumulants
1:Input: — matrix of centred data; — order; — lower cumulants structured in blocks of size each.
2:Output: — the th cumulant stored as the block structure.
3:function cumulant()
5:     return
6:end function
Algorithm 5 The th cumulant, using Eq. (31).

4 Implementation

All algorithms presented in this paper are implemented in the Julia programming language [8, 7]. Julia is a high level language in which multi-dimensional tables are first class types [6]. For purposes of the algorithms, two modules were created. In the first one, SymmetricTensors.jl, [21] the block structure of super-symmetric tensors was implemented. In the second module, Cumulants.jl, [20] we used the block structure to compute and store moment and cumulant tensors. The implementation of cumulants calculation uses multiprocessing primitives built into the Julia programming language: remote references and remote calls. A remote reference is an object that allows any process to reference an object stored in a specific process. A remote call allows a process to request a function call on certain arguments on another process.

5 Performance analysis

This section is dedicated to the performance analysis of the core elements of our algorithms. These are Eq. (6) which calculates the moment tensor and Eq. (31) which calculates the cumulant tensor. Firstly, we discuss theoretical analysis and then focus on the performance of our implementation. In the final subsection, we show how our implementation compares to the current state of the art.

5.1 Theoretical analysis

We start by discussing the performance of the moment tensor. With reference to Section 2, let us recall that storage of the moment tensor requires storage of the floating-point numbers. We can approximate for  [49]. Since we usually calculate cumulants of the order of

and we deal with high dimensional data, we need approximately

of the computer storage space, compared with the naïve storage scheme.

As for the cumulants, one should primarily note that the number of elements of the inner sum in Eq. (31) in the second line equals the number of set partitions of into exactly parts, such that no part is of size one, and can be represented as:


We call it a modification of the Stirling number of the second kind , and compute it as follows


where we count the number of ways to divide elements into subsets of length such that , and so . Factor in the denominator counts the number of subset permutations. Some examples of are , , and .

The following sum


is the number of all partitions of a set of size into subsets, such that there is no subset of size one, and therefore


Here is a Bell number [14], the number of all partitions of a set of size including subsets of size one and is the number of partitions of a set of size into subsets such that at least one subset is of size one. Relation Eq. (37) is derived from the fact that there is a bijective relation between partitions of element set into subsets such that at least one subset is of size one and partitions of element set into subsets such that there is no subset of size one.

To compute each element of the inner sum in Eq. (31), we need multiplications, and consequently to compute each element of the outer sum in Eq. (31), we need multiplications. Finally, the number of multiplications required to compute the second term of the RHS of Eq. (31) is


for each tensor element. Let us note that , , . The plot of and the upper bound are shown in Fig. 1. From Fig. 1 the proposed upper bound produces a very good approximation of the number of multiplications.

Figure 1: Plots of exact number of multiplications , and its upper bound based on Stirling numbers. Note the logarithmic scale on the axis.

From Eq. (6), Eq. (31) and Eq. (38) we can conclude that to compute the th cumulant’s element we need multiplications for the central moment and multiplications for the sums in Eq. (31). However, in order to calculate the cumulant in an accurate manner, we need large data sets, i.e. for we use and for the data size must be even larger. Bearing in mind that computation of cumulants of the order of is inapplicable, the foregoing gives . Henceforth dominant computational power is required to calculate the moment tensor, so there appears the need for approximately multiplications to compute each th cumulant’s element. To compute the whole th cumulant tensor we need approximately multiplications, while the factor is caused the block structure application.

It is now possible to compare the complexity of our algorithm with that of the naïve algorithm for chosen cumulants. For the th cumulant, the naïve algorithm would use Eq. (11) directly and would not take advantage of the super-symmetry of tensors. Therefore, it requires multiplications to compute a single cumulant tensor element and multiplications to compute the whole cumulant tensor. Our algorithm, in this case, decreases the computational complexity by the factor of .

Analogically, the naïve formula for the th cumulant