DeepAI

A Tutorial on Multivariate k-Statistics and their Computation

This document aims to provide an accessible tutorial on the unbiased estimation of multivariate cumulants, using k-statistics. We offer an explicit and general formula for multivariate k-statistics of arbitrary order. We also prove that the k-statistics are unbiased, using Möbius inversion and rudimentary combinatorics. Many detailed examples are considered throughout the paper. We conclude with a discussion of k-statistics computation, including the challenge of time complexity, and we examine a couple of possible avenues to improve the efficiency of this computation. The purpose of this document is threefold: to provide a clear introduction to k-statistics without relying on specialized tools like the umbral calculus; to construct an explicit formula for k-statistics that might facilitate future approximations and faster algorithms; and to serve as a companion paper to our Python library PyMoments, which implements this formula.

06/25/2018

A Proof of the Front-Door Adjustment Formula

We provide a proof of the the Front-Door adjustment formula using the do...
09/30/2021

On simultaneous best linear unbiased prediction of future order statistics and associated properties

In this article, the joint best linear unbiased predictors (BLUPs) of tw...
04/24/2019

CLT for non-Hermitian random band matrices with variance profiles

We show that the fluctuations of the linear eigenvalue statistics of a n...
06/30/2022

kStatistics: Unbiased Estimates of Joint Cumulant Products from the Multivariate Faà Di Bruno's Formula

kStatistics is a package in R that serves as a unified framework for est...
05/16/2021

General order adjusted Edgeworth expansions for generalized t-tests

We develop generalized approach to obtaining Edgeworth expansions for t-...
02/23/2021

A robust multivariate linear non-parametric maximum likelihood model for ties

Statistical analysis in applied research, across almost every field (e.g...
08/01/2019

General proof of a limit related to AR(k) model of Statistics

Computing moments of various parameter estimators related to an autoregr...

1 Introduction

Cumulants are a class of statistical moments that succinctly describe univariate and multivariate distributions. Low-order cumulants are quite familiar: first-order cumulants are means, second-order cumulants are covariances, and third-order cumulants are third central moments. But fourth-order cumulants and larger are difficult to express in terms of central or raw moments. Still, higher-order cumulants have found a variety of applications, largely because they preserve the intuition of central moments while also featuring desirable multilinearity and additivity properties. Various applications have exploited these properties to solve problems in statistics, signal processing, control theory, and other fields.

This note concerns the unbiased estimation of cumulants from data. The canonical unbiased estimators of cumulants, known as Fisher’s -statistics, or more simply -statistics, have been around since Fisher’s seminal work on univariate -statistics in 1930 [4] and Wishart and Kendall’s later work extending them to the multivariate case (e.g., [6]). These papers provide formulas for low-order -statistics, and they describe the process of symbol manipulation that can be used to construct higher-order formulas, but they stop short of an explicit, general expression for -statistics. The objective of this note is to provide such an expression, as well as a self-contained derivation.

The technical content of this paper is not really novel. We highlight the deep connection that cumulants and -statistics have with Möbius inversion on the partition lattice, but this connection has been known since at least 1983 [13] and has been noted in subsequent work [11, 7]. The same formulas for multivariate -statistics derived here have also been constructed through the umbral calculus [3, 9], though interpreting these formulas requires a nontrivial investment of effort into learning the umbral calculus formalism. Software packages are also available for -statistics. MathStatica, a proprietary Mathematica package, provides methods for symbolic -statistic formulas [10]. An R package, kStatistics [8], is available to compute multivariate -statistics of data samples. Our own library PyMoments implements multivariate -statistics in Python [12]. Thus, rather than providing new insight into the problem of cumulant estimation, the aim of this paper is to serve as a quick, accessible, and explicit reference for multivariate -statistics.

We also hope this paper will invite discussion regarding the efficient computation of -statistics. The time complexity of computing these statistics scales poorly with order, so in order to make higher-order -statistics useful in real-world applications, it is necessary to optimize their efficiency. We briefly discuss some possible avenues toward efficient computation or approximation of these statistics, but this topic is still under-explored.

The paper is organized as follows. In the remainder of this section, we introduce the preliminary mathematical concepts that are needed to derive -statistics, including the definition of cumulants themselves (Section 1.2) and their connection to Möbius inversion on the partition lattice (Section 1.3). Section 2 contains the main results—a definition of and explicit formula for -statistics in terms of raw sample moments (Definition 2.1), and a proof of their unbiased estimation that reveals a derivation of these statistics (Section 2.1). The last part of the paper, Section 3, briefly discusses some points on the computational efficiency of evaluating -statistics.

1.1 Preliminaries

Multisets and Multi-indices

A multiset is a set that allows for repeated elements. There are two ways to represent a multiset. The simplest representation is explicit enumeration of the elements, e.g., , where it is possible that . When the universe of possible elements in the multiset is clear from context, another representation is to use a multi-index, which assigns an integer multiplicity to every element in the universe. For example, when we are considering multisets with elements drawn from , we can encode the multiset using a multi-index , where is the multiplicity of in the multiset. We will often use “multiset generator” notation to describe a multiset; or example, .

Partitions

A partition of a set is a collection of mutually disjoint subsets , called blocks, such that . We denote partitions as sets of blocks, e.g., . The size of a partition is the number of blocks: . Given two partitions of the same set, we say that refines , and write , if every block in is the subset of a block in . The partition lattice is the poset of partitions of the set , with refinement as a partial order. Within the partition lattice, note that the unique partition with one block is the unique maximum element; similarly, the unique partition with blocks is the unique minimum element. Figure 1 provides a visual representation of the partition lattice .

The number of partitions on with size is given by Stirling’s number of the second kind, and is given by

 {nk}=1k!k−1∑i=0(−1)i(ki)(k−i)n

for positive . The total number of partitions in is known as Bell’s number:

 Bn=|Πn|=n∑k=1{nk}

General Notation

Given two non-negative integers , the falling factorial is the quantity .

1.2 Cumulants

We begin with a formal definition of cumulants and (implicitly) a review of our notational conventions. Given a random vector

, where

are scalar random variables, define the

moment generating function and the cumulant generating function by

 MX(t)=E[et⊤X],KX(t)=logMX(t)

 MX(t) =1+n∑i=1m[i]ti+12n∑i,j=1m[i,j]titj+n∑i,j,k=1m[i,j,k]titjtk+⋯ KX(t) =n∑i=1κ[i]ti+12n∑i,j=1κ[i,j]titj+n∑i,j,k=1κ[i,j,k]titjtk+⋯

where the coefficients and are defined for any multiset from the indices . The coefficients in the expansion of the moment generating function are familiar—for example,

 m[i,j,k]=∂3MX(t)∂ti∂tj∂tk∣∣∣t=\mathbbold0n=E[∂3∂ti∂tj∂tket⊤X]∣∣ ∣∣t=\mathbbold0n=E[XiXjXk]

is a third-order raw moment. Of course, this relationship holds true in general: the coefficients in the expansion of the moment generating function are precisely the raw moments of .

Cumulants are defined similarly, as coefficients in the Taylor expansion of . Formally, given any multiset from , or equivalently, given any multi-index on , we define the cumulant

 κα(X)=∂|α|KX(t)∂α(1)t1∂α(2)t2⋯∂α(n)tn∣∣ ∣∣t=\mathbbold0n (1.1)

as the coefficient of the term in the series expansion of . The order of a cumulant is the size . Low-order cumulants have familiar interpretations, as the next few examples demonstrate:

Example 1.1 (First-Order Cumulants).

Consider a single random variable . The first-order cumulant of this variable is

 κ[i](X)=∂KX(t)∂ti∣∣∣t=\mathbbold0n=1MX(t)∂MX(t)∂ti∣∣∣t=\mathbbold0n=m[i]=E[Xi]

Thus, first-order cumulants are identical to first-order raw moments, i.e., means.

Example 1.2 (Second-Order Cumulants).

Consider a pair of random variables , possibly repeating. The second-order cumulant of this pair of variables is

 κ[i,j](X)=∂2KX(t)∂ti∂tj∣∣∣t=\mathbbold0n =1MX(t)∂2MX(t)∂ti∂tj−1MX(t)2∂MX(t)∂ti∂MX(t)∂tj∣∣∣t=\mathbbold0n =m[i,j]−m[i]m[j] =E[XiXj]−E[Xi]E[Xj] =cov(Xi,Xj)

Thus, second-order cumulants are identical to covariances.

Example 1.3 (Third-Order Cumulants).

Consider a triple of random variables , possibly repeating. After evaluating and simplifying the appropriate third derivative, we find that

 κ[i,j,k](X)=∂3KX(t)∂ti∂tj∂tk∣∣∣t=\mathbbold0n =2m[i]m[j]m[k]−m[i]m[i,j]−m[j]m[i,k]−m[k]m[i,j]+m[i,j,k]

In particular, if , we obtain the third univariate cumulant of the random variable :

 κ[i,i,i](X) =2E[Xi]3−3E[Xi]E[Xi]2+E[X3i] =E[(Xi−E[Xi])3] =var(Xi)3/2skew(Xi)

where is the moment coefficient of skewness. In other words, third univariate cumulants are identical to third central moments, thereby quantifying the skewness of a distribution.

1.3 Cumulants, Raw Moments, and Möbius Inversion

As the previous three examples suggest, cumulants and raw moments are closely related—after all, cumulants and raw moments are the series coefficients of functions related by a log transform. High-order derivatives of the logarithm result in an abundance of terms, which rapidly get out of hand when trying to derive high-order cumulants manually. Fortunately, by invoking the multivariate version of Faà di Bruno’s formula, we can use some powerful ideas from combinatorics to compactly represent these computations. This subsection will reveal a general formula for expression cumulants in terms of raw moments, which will prove useful in our later derivation of -statistics.

It is a bit simpler to start in the reverse direction, writing raw moments in terms of cumulants, and then using Möbius inversion to obtain our desired expressions. Writing raw moments in terms of cumulants is really a straightforward application of Faà di Bruno’s generalization of the chain rule:

Lemma 1.4 (Faà di Bruno’s Formula).

Let and be positive integers, and let and be a pair of functions that are differentiable to order . Then

 ∂nf(g(x))∂x1∂x2⋯∂xn=∑π∈Πnf(|π|)(g(x))∏B∈π∂|B|g(x)∏i∈B∂xi (1.2)

where denotes the th derivative, loops through every partition of the set , loops through each block in a given partition, and loops through each element of contained within a given block.

See [5] for an overview and proof of this formula (as well as a different discussion of the application we are about to see). Equation (1.2) is an entrypoint that will allow us to apply the combinatorics of the partition lattice to the study of cumulants.

In order to write raw moments in terms of cumulants, we can express , where and . Therefore, applying (1.2), we obtain

 m[i1,i2,…,ik] =∂neKX(t)∂ti1∂ti2⋯∂tik∣∣∣t=\mathbbold0n =∑π∈Πkdexp(y)dy∣∣∣y=KX(\mathbbold0n)⎛⎝∏B∈π∂|B|KX(t)∏j∈B∂tij∣∣ ∣∣t=\mathbbold0n⎞⎠ =∑π∈Πk∏B∈π∂|B|KX(t)∏i∈B∂tij∣∣ ∣∣t=\mathbbold0n

We recognize the inner derivative as a cumulant of the random vector , in particular, the cumulant corresponding to the multiset of indices . Therefore, simplifying this equation, we can express the raw moment in terms of cumulants, as follows:

 m[i1,i2,…,ik]=∑π∈Πk∏B∈πκ[ij∣j∈B] (1.3)

Let us consider some small examples of this formula:

Example 1.5 (Low-Order Raw Moments).

Let us consider how to construct fist-order and second-order raw moments from cumulants. Given any particular random variable from the vector , we know that , either by applying (1.3) or recalling Example 1.1. Next, given a pair of (possibly identical) random variables from the vector, we use (1.3) to compute

 m[i1,i2]=∑π∈Π2∏B∈πκ[ij∣j∈B] =∏B∈{{1},{2}}κ[ij∣j∈B]+∏B∈{{1,2}}κ[ij∣j∈B] =κ[i1]κ[i2]+κ[i1,i2]

Combining these two results, we see that

 κ[i1,i2]=m[i1,i2]−κ[i1]κ[i2]=m[i1,i2]−m[i1]m[i2]=cov(Xi1,Xi2)

replicating our conclusion from Example 1.2.

This example hints at the possibility of inverting (1.3) to express cumulants as functions of central moments. It turns out that stating a general formula for this inversion is straightforward, thanks to Möbius inversion. The general topic of Möbius inversion on posets is beyond the scope of this note, but many good lecture notes are available online for an easy introduction (for example, [2]), or the reader may refer to a text like Aigner [1] for a more detailed and rigorous discussion. Fortunately, Möbius inversion on the partition lattice is a standard example in this area of combinatorics, so we can cut to the chase and state the needed result:

Lemma 1.6 (Möbius Inversion on the Partition Lattice: Part I).

Consider two functions . The following are equivalent:

1.  f(π)=∑ρ≤πg(ρ),∀π∈Πn (1.4)
2.  g(π)=∑ρ≤π(−1)|ρ|−1(|ρ|−1)!f(ρ),∀π∈Πn (1.5)

Lemma 1.6 provides a handy formula to invert sums over refinements of a given element of the partition lattice. While not immediately obvious, Lemma 1.6 can be used to invert (1.3), leading to the following result:

Theorem 1.7 (Cumulants from Raw Moments).

Consider the raw moments and cumulants of a random vector . For any multiset from the indices , the corresponding cumulant can be expressed in terms of the raw moments by

 κ[i1,i2,…,ik]=∑π∈Πk(−1)|π|−1(|π|−1)!∏B∈πm[ij∣j∈B] (1.6)
Proof.

Define two maps by

 gk(π)=∏B∈πκ[ij∣j∈B]

and

 fk(π)=∏B∈πm[ij∣j∈B]

Let be the maximum partition in . Then (1.3) can be re-written , since the set is precisely the set of refinements of . In fact, this is enough to conclude that for all . This is because we can write

 fk(π)=∏B∈πfB(^1B)

where is the maximum element of the lattice of partitions of , and is defined similar to , but on the elements of instead of . Invoking (1.3), we have that (where is defined similar to ), so that

 fk(π)=∏B∈π∑ρB≤^1BgB(ρB) =∑ρB1≤^1B1∑ρB2≤^1B2⋯∑ρB|π|≤^1B|π|∏B∈πgB(ρB) =∑ρ≤πgk(ρ)

since the set of all tuples of refinements is isomorphic to the product of lattices , which is itself isomorphic to the set of all refinements of . Thus and satisfy (1.4), so we invoke the Möbius inversion in Lemma 1.6 to obtain (1.5). In particular, evaluating (1.5) on , we obtain

 gk(^1k) =∑ρ≤^1k(−1)|ρ|−1(|ρ|−1)!fk(ρ) =∑ρ∈Πk(−1)|ρ|−1(|ρ|−1)!∏B∈πm[ij∣j∈B]

But , so we obtain (1.6). ∎

2 Multivariate k-Statistics

Now that we have examined multivariate cumulants, our next challenge is to estimate them from a sample. Once again, suppose that we have a random vector

, distributed according to some joint distribution

. Further suppose that, instead of knowing , all we have is an i.i.d. sample from this distribution, where . We would like to estimate the cumulants of using some statistic, i.e., some function of the data .

It turns out that we can obtain an unbiased estimate of each cumulant using raw sample moments. For each multiset of the indices , the corresponding raw sample moment is the statistic given by

 ^m[i1,i2,…,ik]=1NN∑t=1xt,i1xt,i2⋯xt,ik (2.1)

Because the observations in the sample are independent, is an unbiased estimator for the raw moment . Furthermore, we can use raw sample moments to obtain an unbiased estimates of cumulants:

Definition 2.1 (k-Statistic).

Consider the random vector and some multiset from the indices . Given a sample of with at least observations, the corresponding -statistic is given by

 k[i1,i2,…,ik]=∑π∈Πk(−1)|π|−1cπ∏B∈π^m[ij∣j∈B] (2.2)

where we define a positive coefficient for each partition in by

 cπ=N|π||B1|∑b1=1|B2|∑b2=1⋯|B|π||∑b|π|=1(∑|π|j=1bj−1)!(N)∑|π|j=1bj⎛⎝|π|∏j=1{|Bj|bj}(bj−1)!⎞⎠,∀π∈Πk (2.3)

and are the blocks of the partition .

Equation (2.2) is a linear combination of products of raw sample moments (which are computed from the data). The linear combination itself involves a sum over the partition lattice with coefficients of alternating sign, which hints at Möbius inversion. Indeed, we can derive the -statistic by using Möbius inversion to correct the bias of products of raw sample moments. We provide a much more detailed derivation in the next section, when we prove that -statistics are unbiased:

Theorem 2.2 (k-Statistics are Unbiased Estimators of Cumulants).

Consider the random vector , and let be any multiset of the indices . The -statistic computed from any i.i.d. sample of with at least observations is an unbiased estimator of the cumulant, i.e.,

 E[k[i1,i2,…,ik]]=κ[i1,i2,…,ik]

First, we will consider some lower-order examples of multivariate -statistics.

Example 2.3 (First-Order k-Statistics).

The partition lattice consists of only one element , so first-order -statistics are easily computed as

 k[i]=c{{1}}^m[i]=^m[i]

Thus, as expected from Example 1.1, first-order -statistics are merely sample means.

Example 2.4 (Second-Order k-Statistics).

The partition lattice consists of two elements: , and . The corresponding coefficients are

 c12=N1((0)!(N)1{21}(0)!+(1)!(N)2{22}(1)!)=1+1N−1=NN−1

and

Therefore

 k[i1i2]=NN−1(^m[i1,i2]+^m[i1]^m[i2])

which we recognize as the classical unbiased estimator for covariance. Of course, this is exactly what we should expect after Example 1.2.

Example 2.5 (Third-Order k-Statistics).

The partition lattice has five elements: ,  ,  ,  ,  and . We first compute the respective coefficients , , , , and :

 c123 =N1((0)!(N)1{31}(0)!+(1)!(N)2{32}(1)!+(2)!(N)3{33}(2)!)=1+3N−1+4(N−1)(N−2) c1|23 =N2((1)!(N)2{11}{21}(0)!(0)!+(2)!(N)3{11}{22}(0)!(1)!)=NN−1+2N(N−1)(N−2) c1|2|3

Note that depends only on the number and size of each block, and not the blocks themselves, so . Substituting these coefficients into (2.2) and simplifying, we obtain

 k[i1,i2,i3] =c123^m[i1,i2,i3]+c1|23(^m[i1]^m[i2,i3]+^m[i2]^m[i1,i3]+^m[i3]^m[i1,i2])+c1|2|3^m[i1]^m[i2]^m[i3] =N2(N−1)(N−2)(^m[i1,i2,i3]−^m[i1]^m[i2,i3]−^m[i2]^m[i1,i3]−^m[i3]^m[i1,i2]+2^m[i1]^m[i2]^m[i3])

In particular, if , we obtain the third univariate -statistic

 k[i,i,i]=N2(N−1)(N−2)(^m[i,i,i]−3^m[i]^m[i,i]+^m3[i])

2.1 Proof of Theorem 2.2

The general outline of the proof is as follows. We will first note that, while raw sample moments are unbiased estimators of raw moments, it is still the case that products of raw sample moments provide biased estimates for products of raw moments. The first step will be to quantify this bias. Second, we will once again use Möbius inversion over the partition lattice to obtain an unbiased estimator of products of raw moments, in terms of products of raw sample moments. Finally, we will substitute this estimator into (1.6) and simplify.

We begin by examining the expected value of raw sample moments:

Lemma 2.6 (Bias of Products of Raw Sample Moments).

Consider the random vector , and consider some multiset from the indices . For every , we have

 E[∏B∈π^m[ij∣j∈B]]=1N|π|∑ρ≥π(N)|ρ|∏C∈ρm[ij∣j∈C] (2.4)
Proof.

Our first step is to switch the order of sums and products, as follows:

 E[∏B∈π^m[ij∣j∈B]] =1N|π|E[∏B∈πN∑t=1∏j∈Bxt,ij] =1N|π|N∑t1=1N∑t2=1⋯N∑t|π|=1E⎡⎣|π|∏j=1∏ℓ∈Bjxtj,iℓ⎤⎦

Note that the inner expectation depends on which of the observations are identical, since observations at distinct times are independent, allowing us to factor the expected value. With this in mind, we will partition the hypercube of -indices that we are summing over into equivalence classes, based on partitions of the set . Given such a partition , we define the equivalence class as the set of index tuples with the following property: for all , we have that if and only if for some block . In other words, the blocks of represent elements of the -index that are identical. Clearly each of the index tuples in the sum belong to some equivalence class . Furthermore, each equivalence class contains elements: possible values for indices in the first block, possible values in the second block, and so on.

For all , we have the following property:

 E⎡⎣|π|∏j=1∏ℓ∈Bjxtj,iℓ⎤⎦=∏C∈σE⎡⎣∏j∈C∏ℓ∈BjXiℓ⎤⎦=∏C∈σE⎡⎢⎣∏ℓ∈⋃j∈CBjXiℓ⎤⎥⎦=∏C∈σm[iℓ∣ℓ∈⋃j∈CBj]

This follows because the expected value factors along the blocks of , since the blocks have pairwise-distinct times, and thus the observations in each block are pairwise independent. Then we can write

 E[∏B∈π^m[ij∣j∈B]]=1N|π|∑σ∈Π|π|(N)|σ|∏C∈σm[iℓ∣ℓ∈⋃j∈CBj]

The final step is to note that there is a bijection between partitions of and partitions of that are coarser than . This bijection is easy to see: for each block , replace all of the blocks in with coarser blocks , resulting in a coarser partition . Due to this bijection, we can re-write the sum over as a sum over coarser partitions , obtaining (2.4). ∎

Equation (2.4) has a somewhat familiar form—a sum over partitions that are coarser than . Recall from the proof of Theorem 1.7 that we used Möbius inversion to invert a sum over partitions that refine . While the direction of the sum makes a difference—we cannot use Lemma 1.6 in this case—the partition lattice still admits a Möbius inversion formula to invert (2.4). Once again, we will state the needed formula here, and direct the interested reader to a combinatorics text like [1]:

Lemma 2.7 (Möbius Inversion on the Partition Lattice: Part II).

Consider two functions . For two partitions , let denote the set of partitions that, when applied to the blocks of , yield the refinement . Then the following are equivalent:

1.  f(π)=∑ρ≥πg(ρ),∀π∈Πn (2.5)
2.  g(π)=∑ρ≥π⎛⎝(−1)|π|−|ρ|∏σ∈Σ(π,ρ)(|σ|−1)!⎞⎠f(ρ),∀π∈Πn (2.6)

The set may cause some confusion, so it is worth considering an example before we proceed. Consider two partitions of : , and . Clearly is a refinement of . Furthermore, we can obtain from by partitioning each block of . Let represent the partition of into , and let be the partition of into . The set is the collection of these two partitions.

Next, we apply this new Möbius inversion formula to invert (2.4), obtaining an unbiased estimator for products of sample moments:

Lemma 2.8 (Unbiased Estimation of Products of Sample Moments).

Consider the random vector , and consider some multiset from the indices . For every , define a statistic

 ^mπ=1(N)|π|∑ρ≥π⎛⎝(−1)|π|−|ρ|N|ρ|∏σ∈Σ(ρ,π)(|σ|−1)!⎞⎠∏C∈ρ^m[ij∣j∈C] (2.7)

where are the blocks of . Then is an unbiased estimator of the product of raw moments , i.e.,

 E[^mπ]=∏B∈πm[ij∣j∈B] (2.8)
Proof.

Let us define two functions by

 f(π) =N|π|E[∏B∈π^m[ij∣j∈B]] g(π) =(N)|π|∏B∈πm[ij∣j∈B]

In terms of these functions, Lemma 2.6 states that