1 Introduction
Cumulants are a class of statistical moments that succinctly describe univariate and multivariate distributions. Loworder cumulants are quite familiar: firstorder cumulants are means, secondorder cumulants are covariances, and thirdorder cumulants are third central moments. But fourthorder cumulants and larger are difficult to express in terms of central or raw moments. Still, higherorder cumulants have found a variety of applications, largely because they preserve the intuition of central moments while also featuring desirable multilinearity and additivity properties. Various applications have exploited these properties to solve problems in statistics, signal processing, control theory, and other fields.
This note concerns the unbiased estimation of cumulants from data. The canonical unbiased estimators of cumulants, known as Fisher’s statistics, or more simply statistics, have been around since Fisher’s seminal work on univariate statistics in 1930 [4] and Wishart and Kendall’s later work extending them to the multivariate case (e.g., [6]). These papers provide formulas for loworder statistics, and they describe the process of symbol manipulation that can be used to construct higherorder formulas, but they stop short of an explicit, general expression for statistics. The objective of this note is to provide such an expression, as well as a selfcontained derivation.
The technical content of this paper is not really novel. We highlight the deep connection that cumulants and statistics have with Möbius inversion on the partition lattice, but this connection has been known since at least 1983 [13] and has been noted in subsequent work [11, 7]. The same formulas for multivariate statistics derived here have also been constructed through the umbral calculus [3, 9], though interpreting these formulas requires a nontrivial investment of effort into learning the umbral calculus formalism. Software packages are also available for statistics. MathStatica, a proprietary Mathematica package, provides methods for symbolic statistic formulas [10]. An R package, kStatistics [8], is available to compute multivariate statistics of data samples. Our own library PyMoments implements multivariate statistics in Python [12]. Thus, rather than providing new insight into the problem of cumulant estimation, the aim of this paper is to serve as a quick, accessible, and explicit reference for multivariate statistics.
We also hope this paper will invite discussion regarding the efficient computation of statistics. The time complexity of computing these statistics scales poorly with order, so in order to make higherorder statistics useful in realworld applications, it is necessary to optimize their efficiency. We briefly discuss some possible avenues toward efficient computation or approximation of these statistics, but this topic is still underexplored.
The paper is organized as follows. In the remainder of this section, we introduce the preliminary mathematical concepts that are needed to derive statistics, including the definition of cumulants themselves (Section 1.2) and their connection to Möbius inversion on the partition lattice (Section 1.3). Section 2 contains the main results—a definition of and explicit formula for statistics in terms of raw sample moments (Definition 2.1), and a proof of their unbiased estimation that reveals a derivation of these statistics (Section 2.1). The last part of the paper, Section 3, briefly discusses some points on the computational efficiency of evaluating statistics.
1.1 Preliminaries
Multisets and Multiindices
A multiset is a set that allows for repeated elements. There are two ways to represent a multiset. The simplest representation is explicit enumeration of the elements, e.g., , where it is possible that . When the universe of possible elements in the multiset is clear from context, another representation is to use a multiindex, which assigns an integer multiplicity to every element in the universe. For example, when we are considering multisets with elements drawn from , we can encode the multiset using a multiindex , where is the multiplicity of in the multiset. We will often use “multiset generator” notation to describe a multiset; or example, .
Partitions
A partition of a set is a collection of mutually disjoint subsets , called blocks, such that . We denote partitions as sets of blocks, e.g., . The size of a partition is the number of blocks: . Given two partitions of the same set, we say that refines , and write , if every block in is the subset of a block in . The partition lattice is the poset of partitions of the set , with refinement as a partial order. Within the partition lattice, note that the unique partition with one block is the unique maximum element; similarly, the unique partition with blocks is the unique minimum element. Figure 1 provides a visual representation of the partition lattice .
The number of partitions on with size is given by Stirling’s number of the second kind, and is given by
for positive . The total number of partitions in is known as Bell’s number:
General Notation
Given two nonnegative integers , the falling factorial is the quantity .
1.2 Cumulants
We begin with a formal definition of cumulants and (implicitly) a review of our notational conventions. Given a random vector
, whereare scalar random variables, define the
moment generating function and the cumulant generating function byAssuming that and admit Taylor expansions about , we can write
where the coefficients and are defined for any multiset from the indices . The coefficients in the expansion of the moment generating function are familiar—for example,
is a thirdorder raw moment. Of course, this relationship holds true in general: the coefficients in the expansion of the moment generating function are precisely the raw moments of .
Cumulants are defined similarly, as coefficients in the Taylor expansion of . Formally, given any multiset from , or equivalently, given any multiindex on , we define the cumulant
(1.1) 
as the coefficient of the term in the series expansion of . The order of a cumulant is the size . Loworder cumulants have familiar interpretations, as the next few examples demonstrate:
Example 1.1 (FirstOrder Cumulants).
Consider a single random variable . The firstorder cumulant of this variable is
Thus, firstorder cumulants are identical to firstorder raw moments, i.e., means.
Example 1.2 (SecondOrder Cumulants).
Consider a pair of random variables , possibly repeating. The secondorder cumulant of this pair of variables is
Thus, secondorder cumulants are identical to covariances.
Example 1.3 (ThirdOrder Cumulants).
Consider a triple of random variables , possibly repeating. After evaluating and simplifying the appropriate third derivative, we find that
In particular, if , we obtain the third univariate cumulant of the random variable :
where is the moment coefficient of skewness. In other words, third univariate cumulants are identical to third central moments, thereby quantifying the skewness of a distribution.
1.3 Cumulants, Raw Moments, and Möbius Inversion
As the previous three examples suggest, cumulants and raw moments are closely related—after all, cumulants and raw moments are the series coefficients of functions related by a log transform. Highorder derivatives of the logarithm result in an abundance of terms, which rapidly get out of hand when trying to derive highorder cumulants manually. Fortunately, by invoking the multivariate version of Faà di Bruno’s formula, we can use some powerful ideas from combinatorics to compactly represent these computations. This subsection will reveal a general formula for expression cumulants in terms of raw moments, which will prove useful in our later derivation of statistics.
It is a bit simpler to start in the reverse direction, writing raw moments in terms of cumulants, and then using Möbius inversion to obtain our desired expressions. Writing raw moments in terms of cumulants is really a straightforward application of Faà di Bruno’s generalization of the chain rule:
Lemma 1.4 (Faà di Bruno’s Formula).
Let and be positive integers, and let and be a pair of functions that are differentiable to order . Then
(1.2) 
where denotes the th derivative, loops through every partition of the set , loops through each block in a given partition, and loops through each element of contained within a given block.
See [5] for an overview and proof of this formula (as well as a different discussion of the application we are about to see). Equation (1.2) is an entrypoint that will allow us to apply the combinatorics of the partition lattice to the study of cumulants.
In order to write raw moments in terms of cumulants, we can express , where and . Therefore, applying (1.2), we obtain
We recognize the inner derivative as a cumulant of the random vector , in particular, the cumulant corresponding to the multiset of indices . Therefore, simplifying this equation, we can express the raw moment in terms of cumulants, as follows:
(1.3) 
Let us consider some small examples of this formula:
Example 1.5 (LowOrder Raw Moments).
Let us consider how to construct fistorder and secondorder raw moments from cumulants. Given any particular random variable from the vector , we know that , either by applying (1.3) or recalling Example 1.1. Next, given a pair of (possibly identical) random variables from the vector, we use (1.3) to compute
Combining these two results, we see that
replicating our conclusion from Example 1.2.
This example hints at the possibility of inverting (1.3) to express cumulants as functions of central moments. It turns out that stating a general formula for this inversion is straightforward, thanks to Möbius inversion. The general topic of Möbius inversion on posets is beyond the scope of this note, but many good lecture notes are available online for an easy introduction (for example, [2]), or the reader may refer to a text like Aigner [1] for a more detailed and rigorous discussion. Fortunately, Möbius inversion on the partition lattice is a standard example in this area of combinatorics, so we can cut to the chase and state the needed result:
Lemma 1.6 (Möbius Inversion on the Partition Lattice: Part I).
Consider two functions . The following are equivalent:

(1.4) 
(1.5)
Lemma 1.6 provides a handy formula to invert sums over refinements of a given element of the partition lattice. While not immediately obvious, Lemma 1.6 can be used to invert (1.3), leading to the following result:
Theorem 1.7 (Cumulants from Raw Moments).
Consider the raw moments and cumulants of a random vector . For any multiset from the indices , the corresponding cumulant can be expressed in terms of the raw moments by
(1.6) 
Proof.
Define two maps by
and
Let be the maximum partition in . Then (1.3) can be rewritten , since the set is precisely the set of refinements of . In fact, this is enough to conclude that for all . This is because we can write
where is the maximum element of the lattice of partitions of , and is defined similar to , but on the elements of instead of . Invoking (1.3), we have that (where is defined similar to ), so that
since the set of all tuples of refinements is isomorphic to the product of lattices , which is itself isomorphic to the set of all refinements of . Thus and satisfy (1.4), so we invoke the Möbius inversion in Lemma 1.6 to obtain (1.5). In particular, evaluating (1.5) on , we obtain
But , so we obtain (1.6). ∎
2 Multivariate Statistics
Now that we have examined multivariate cumulants, our next challenge is to estimate them from a sample. Once again, suppose that we have a random vector
, distributed according to some joint distribution
. Further suppose that, instead of knowing , all we have is an i.i.d. sample from this distribution, where . We would like to estimate the cumulants of using some statistic, i.e., some function of the data .It turns out that we can obtain an unbiased estimate of each cumulant using raw sample moments. For each multiset of the indices , the corresponding raw sample moment is the statistic given by
(2.1) 
Because the observations in the sample are independent, is an unbiased estimator for the raw moment . Furthermore, we can use raw sample moments to obtain an unbiased estimates of cumulants:
Definition 2.1 (Statistic).
Consider the random vector and some multiset from the indices . Given a sample of with at least observations, the corresponding statistic is given by
(2.2) 
where we define a positive coefficient for each partition in by
(2.3) 
and are the blocks of the partition .
Equation (2.2) is a linear combination of products of raw sample moments (which are computed from the data). The linear combination itself involves a sum over the partition lattice with coefficients of alternating sign, which hints at Möbius inversion. Indeed, we can derive the statistic by using Möbius inversion to correct the bias of products of raw sample moments. We provide a much more detailed derivation in the next section, when we prove that statistics are unbiased:
Theorem 2.2 (Statistics are Unbiased Estimators of Cumulants).
Consider the random vector , and let be any multiset of the indices . The statistic computed from any i.i.d. sample of with at least observations is an unbiased estimator of the cumulant, i.e.,
First, we will consider some lowerorder examples of multivariate statistics.
Example 2.3 (FirstOrder Statistics).
The partition lattice consists of only one element , so firstorder statistics are easily computed as
Thus, as expected from Example 1.1, firstorder statistics are merely sample means.
Example 2.4 (SecondOrder Statistics).
The partition lattice consists of two elements: , and . The corresponding coefficients are
and
Therefore
which we recognize as the classical unbiased estimator for covariance. Of course, this is exactly what we should expect after Example 1.2.
Example 2.5 (ThirdOrder Statistics).
The partition lattice has five elements: , , , , and . We first compute the respective coefficients , , , , and :
Note that depends only on the number and size of each block, and not the blocks themselves, so . Substituting these coefficients into (2.2) and simplifying, we obtain
In particular, if , we obtain the third univariate statistic
2.1 Proof of Theorem 2.2
The general outline of the proof is as follows. We will first note that, while raw sample moments are unbiased estimators of raw moments, it is still the case that products of raw sample moments provide biased estimates for products of raw moments. The first step will be to quantify this bias. Second, we will once again use Möbius inversion over the partition lattice to obtain an unbiased estimator of products of raw moments, in terms of products of raw sample moments. Finally, we will substitute this estimator into (1.6) and simplify.
We begin by examining the expected value of raw sample moments:
Lemma 2.6 (Bias of Products of Raw Sample Moments).
Consider the random vector , and consider some multiset from the indices . For every , we have
(2.4) 
Proof.
Our first step is to switch the order of sums and products, as follows:
Note that the inner expectation depends on which of the observations are identical, since observations at distinct times are independent, allowing us to factor the expected value. With this in mind, we will partition the hypercube of indices that we are summing over into equivalence classes, based on partitions of the set . Given such a partition , we define the equivalence class as the set of index tuples with the following property: for all , we have that if and only if for some block . In other words, the blocks of represent elements of the index that are identical. Clearly each of the index tuples in the sum belong to some equivalence class . Furthermore, each equivalence class contains elements: possible values for indices in the first block, possible values in the second block, and so on.
For all , we have the following property:
This follows because the expected value factors along the blocks of , since the blocks have pairwisedistinct times, and thus the observations in each block are pairwise independent. Then we can write
The final step is to note that there is a bijection between partitions of and partitions of that are coarser than . This bijection is easy to see: for each block , replace all of the blocks in with coarser blocks , resulting in a coarser partition . Due to this bijection, we can rewrite the sum over as a sum over coarser partitions , obtaining (2.4). ∎
Equation (2.4) has a somewhat familiar form—a sum over partitions that are coarser than . Recall from the proof of Theorem 1.7 that we used Möbius inversion to invert a sum over partitions that refine . While the direction of the sum makes a difference—we cannot use Lemma 1.6 in this case—the partition lattice still admits a Möbius inversion formula to invert (2.4). Once again, we will state the needed formula here, and direct the interested reader to a combinatorics text like [1]:
Lemma 2.7 (Möbius Inversion on the Partition Lattice: Part II).
Consider two functions . For two partitions , let denote the set of partitions that, when applied to the blocks of , yield the refinement . Then the following are equivalent:

(2.5) 
(2.6)
The set may cause some confusion, so it is worth considering an example before we proceed. Consider two partitions of : , and . Clearly is a refinement of . Furthermore, we can obtain from by partitioning each block of . Let represent the partition of into , and let be the partition of into . The set is the collection of these two partitions.
Next, we apply this new Möbius inversion formula to invert (2.4), obtaining an unbiased estimator for products of sample moments:
Lemma 2.8 (Unbiased Estimation of Products of Sample Moments).
Consider the random vector , and consider some multiset from the indices . For every , define a statistic
(2.7) 
where are the blocks of . Then is an unbiased estimator of the product of raw moments , i.e.,
(2.8) 