Bounded Statistics

If two probability density functions (PDFs) have values for their first n moments which are quite close to each other (upper bounds of their differences are known), can it be expected that the PDFs themselves are very similar? Shown below is an algorithm to quantitatively estimate this "similarity" between the given PDFs, depending on how many moments one has information about. This method involves the concept of functions behaving "similarly" at certain "length scales", which is also precisely defined. This technique could find use in data analysis, to compare a data set with a PDF or another data set, without having to fit a functional form to the data.


On cumulative entropies in terms of moments of order statistics

In this paper relations among some kinds of cumulative entropies and mom...

Kernel functions based on triplet comparisons

Given only information in the form of similarity triplets "Object A is m...

On upper bounds on expectations of gOSs based on DFR and DFRA distributions

We focus on the problem of establishing the optimal upper bounds on gene...

The Utility Cost of Robust Privacy Guarantees

Consider a data publishing setting for a data set with public and privat...

Non-Gaussian Bayesian Filtering by Density Parametrization Using Power Moments

Non-Gaussian Bayesian filtering is a core problem in stochastic filterin...

Conditional Masking to Numerical Data

Protecting the privacy of data-sets has become hugely important these da...

Rapid and deterministic estimation of probability densities using scale-free field theories

The question of how best to estimate a continuous probability density fr...

1 Introduction

Modern scientific efforts often involve collecting, filtering, analysing and interpreting extremely large amounts of raw data. Famous examples from fundamental physics include the experiments at the Large Hadron Collider (LHC) and the Laser Interferometer Gravitational-Wave Observatory (LIGO). In the more applied sciences, one can think of complicated weather models that are tested for accuracy by comparison with meteorological data. Regression techniques are routinely used by engineers to fit very sophisticated equations of state to pressure-volume-temperature data of complex cryogenic mixtures. In all these cases, it would be of great use to develop a preliminary idea of the nature of a given data set or probability density function (PDF). This could help optimise computational costs.

To this end, one may ask: how much information regarding the underlying PDF can be extracted from the moments of a distribution? This is famously known as the moment problem [1] and is a highly non-trivial question. It is a very important inverse problem with wide applications in data analysis. The question that is addressed in this article is a slight variant of the moment problem: is it possible to compare different distributions based on their moments alone, without having to explicitly construct their PDFs?

More precisely, the moment problem can be stated as follows: given a sequence of real numbers , is it possible to find a PDF such that are its moments, i.e.,

? Depending on the domain of integration, the moment problem is classified into three categories: the Hausdorff problem which has a finite domain (usually taken to be

with no loss in generality), the Stieltjes problem with a half-infinite domain , and the Hamburger problem which spans the whole real line . These cases have been well-studied and conditions for existence and/or uniqueness of solutions have been formulated [1].

However, in many cases, an explicit construction of the PDF [2] may not be the goal, but it is done anyway. For instance, consider the ubiquitous situation of comparing two data sets. One way to do this is to fit PDFs using some regression techniques to the two data sets, and then compare the two PDFs. For the Hausdorff moment problem, this procedure of constructing the PDF from a finite set of moments is ill-defined in the sense of Hadamard, i.e., it is highly sensitive to the values of the moments due to the (exponentially) large condition number of the corresponding Hankel matrix [2]. It is also computationally very expensive, especially for large data sets. The computational cost can be decreased if an approximate form for the PDF can be deduced a priori, and this is the focus of this article. Given the first moments of each distribution/data set, or even just how far apart they are from each other, it is shown that the closeness (or similarity) of the underlying distributions can be commented upon in a very quantitative manner.

The moments of a distribution are directly related to the derivatives of the characteristic function of the PDF. Hence, the closeness of the moments of two distributions directly translates to the similarity of the derivatives (at ) of their characteristic functions. This fact is used, along with Taylor series expansions of the characteristic functions, to precisely characterize the “similarity” of the two distributions. In the case of a Hausdorff moment problem, uniqueness is guaranteed, i.e., if there exists a function that solves the problem, it is unique [1, 2]. In other words, for the case of a finite domain, the moments characterize the PDF completely. Thus, the two PDFs can be expected to be the same if and only if they have exactly the same moments, which is a trivial case. In the case that they are different, the PDFs may be expected to behave similarly only on certain “large” cut-off length scales, and differ significantly on the finer scales. This concept of “scale-wise similarity” is also precisely defined in the article. The result of this algorithm is the estimation of a cut-off length scale over which the two PDFs are similar (as defined by a user-specified tolerance), given how far apart the first few moments are.

The outline of this article is as follows: Section 2 of this article defines some of the concepts used in the algorithm and Section 3 establishes the estimates used in this work. Section 4 describes the algorithm, while Section 5 consists of a brief discussion on the cases of bounded and unbounded support. Then, we illustrate the algorithm with some numerical examples in Section 6, following which the method is extended to higher dimensions in Section 7. Conclusions and scope for future work are presented in Sections 8 and 9, respectively.

2 Definitions

2.1 Distance metric

Consider two absolutely integrable functions over a compact support, . Define a distance metric between the two functions as:


Notice that this definition conforms to all the requirements of a distance metric.

2.2 Characteristic function

The characteristic function (denoted by a capital letter) is the Fourier transform of a PDF. Here, and in what follows, the Fourier transform and its inverse are given by:


Assuming that the derivatives of the characteristic function exist, and that they all vanish at infinity, it can be easily shown that for a normalized PDF of a random variable

, the moment can be expressed as:


where .

Let the moments of two normalized PDFs and be denoted by and , respectively. Define the bound on the difference of the moments by:


2.3 Low-pass filtering

In what follows, the notation in [3] has been used to describe the large scale behaviour of functions. Filtering operators, one of which is extensively used in this article, are defined below.

Definition 1.

The action of the low-pass filtering operator , associated with inverse length scale , on an integrable function is given by:


where is the Fourier transform of .

The low-pass filtering operator has a “smoothing effect” in the sense that it removes any fluctuations of length scale less than . It is trivial to show that is a projection operator. The complement of the low-pass filter is called the high-pass filter, which removes the large scale behaviour and returns the small-scale features of the function.

Definition 2.

The action of the high-pass filtering operator , associated with inverse length scale , on an integrable function is given by:

As an illustration, the action of these filters on a test function is shown in Figure 1.

3 Estimates of the distance

As stated in the introduction, we will use the characteristic relation between the moments and the derivatives of the characteristic functions to construct the required algorithm. Consider two PDFs and over the compact support . Let their respective characteristic functions be and . Assuming we have knowledge of the first moments (starting from the zeroth moment), we expand the characteristic functions in a Taylor series about .

where . The last term in the expansion is the Lagrange form of the remainder in Taylor’s theorem, which is applicable when the function being expanded as a series belongs to over the entire domain and is -times differentiable. (For a rather straightforward proof that the characteristic function satisfies these conditions, see B).

Using these Taylor expansions, and the bounds on the difference of the moments (Section 2.2),


From (5), we have the following inequality:

By the definition of the distance metric and using this inequality,

Recalling (2), we have the following bound:

Similarly, . Combining these results,


At this stage, all that remains is for us to estimate the unknown (and ) in terms of the lower moments, which are known. Depending on how much a priori information we have about the PDF , we may be able to estimate these unknown higher moments in various ways (see A). At this stage, we will consider the simplest case, where we know nothing about except that it is a PDF, i.e., it is integrable. Since , using Hölder’s inequality, it is easy to establish that

Thus, the best (smallest) upper bound is when . From (7) and this estimate,


where .

This means that the two PDFs have “similar” behaviour over length scales greater than 1/Κ, if the distance between the smoothed functions is less than some specified tolerance. The functions may, however, differ from each other significantly over small scales, which is expected since their moments are not exactly equal, but are only similar in the sense of their difference being bounded above.

4 Existence and uniqueness of solutions

Given a PDF, its closeness (at a certain length scale) to any other PDF can be estimated by specifying a tolerance for considering two functions to “behave similarly” and checking if there exists a length scale over which this is achieved. The question to be posed is the following:

For any , does there exist an inverse length scale such that , where and are the low-pass filtered functions as defined in (5)?

This is answered by looking at the polynomial equation (in the variable ):


and calculating the inverse length scale which is a root of this equation. (9) is a polynomial equation of the degree, leading to the existence of solutions. Of these, the sought-after one is a real and positive value of , preferably a unique root (to prevent further complications of having to choose between multiple solutions).

Proposition 1.

There exists a unique positive solution to (9), which characterizes the desired cut-off scale.

Proof: (9) is a polynomial equation of degree with all positive coefficients, except the constant term. Using Descartes’ rule of signs, it can be concluded that there is exactly one positive root, since exactly one sign change occurs throughout the polynomial. ∎

5 Infinite/non-existent moments

5.1 Bounded support

For distributions defined on a bounded support, all moments exist regardless of the PDF. This can be shown as follows. (Since any bounded domain can be mapped to the compact set , only the latter shall be considered, without any loss of generality.)

where the last equality follows from normalization, which is valid for all PDFs since they are Lebesgue-integrable (by definition). Thus, any PDF on a bounded support has all moments and they are all finite. Hence, the method described in this report is certainly applicable to PDFs with bounded support.

5.2 Unbounded support

It is not necessary that all the moments, or even any moments, exist for a PDF with an unbounded support. For instance, while the normal distribution possesses all moments, the Cauchy distribution has none. If the

moment of a distribution does not exist, it means that the derivative of the characteristic function does not exist as a limit at . Similarly, the derivative of the characteristic function could blow up at , signifying the blow up of the moment. In the latter case, the limit exists and is equal to .

Note: However, it can be easily shown that for a PDF with an unbounded support, the existence of a finite moment implies the existence and finiteness of all moments that are lesser than . For instance, ,

The first term in the last step is clearly finite.

If the higher derivatives of the characteristic functions become arbitrarily large, this corresponds to very large fine-scale fluctuations (this interpretation follows from the relation between the moments and the derivatives of the characteristic function). These fluctuations are not physical in the sense that they will prevent any kind of macroscopic equilibration from occurring, and hence characterize far-from-equilibrium processes in their transient states111Note that while turbulent flow is also (strictly speaking) far-from-equilibrium, there is a difference between the stationary and transient states of turbulent flow. In the former, there is statistical equilibrium, and the Kolmogorov spectrum is evidence of fluctuations decreasing with reducing length scales.. Such states of these processes may not even have a well-defined PDF to begin with, and the method of this article is irrelevant to such extreme cases.

6 Numerical examples

In this section, the above method is applied to some PDFs with compact support . Two cases are considered: a “structured” PDF (with a closed-form expression) and an “unstructured” PDF (generated with random numbers). In both cases, various realisations of the PDFs are compared to deduce their similarity, given the first few moments. The calculations were performed using MATLAB for different illustrative values of the parameters. In Figures 2-6 and 8-12 below, the two original PDFs are denoted by solid (red and blue) lines, while the smoothed functions are shown as dashed (red and blue) lines.

6.1 Case 1: Normal random variables

For the first illustration, we consider a normally-distributed random variable over the compact support . Its PDF is of the form:


where is the normalization factor, and is the error function.

Using (2), we find the characteristic function corresponding to this PDF:


From (5) and (11), and a few variable substitutions, we arrive at an integral expression for the smoothed PDF:


(12) was numerically integrated on MATLAB, and the algorithm was applied to two such PDFs and the cut-off scales (and other details) were evaluated. The results are shown in Table 1 and Figures 2 - 6. The following observations can be made:

  1. The cut-off scale is higher (smaller length scales are probed) in the case where , when compared to the case of . This is expected since it is harder to “match” two PDFs whose peaks are separated, as opposed to when one is simply broader than the other.

  2. Increasing the number of moments increases the cut-off (wavenumber) scale. This is because smaller length scales can be probed with more information (moments).

  3. Increasing the number of moments also increases the distance between the smoothed functions. This could perhaps be due to the moments acting as constraints that are to be adhered to while comparing the functions. (It is to be noted that this trend is not observed in the case of the “less-structured” PDFs considered in the next section.)

  4. Reducing the tolerance reduces the cut-off scale. Once again, this is expected since a tighter tolerance may be achieved only by sufficient smoothing of the functions.

6.2 Case 2: Scale-separated PDFs

Scale-separation is commonly encountered in nature, when the dynamics of a physical system can be separated into two (usually narrow) intervals of length/time scales (see Figure 1). For instance, in certain models of combustion, the assumption of “fast chemistry” is invoked to simplify the analysis. This means that some of the reactions at equilibrium are much faster than others, so that they may be considered instantaneous, and this is a manifestation of temporal scale-separation. As an example of spatial scale-separation, one could consider fluid flow that is not fully turbulent. In such a flow, it is possible to clearly discern the large-scale flow features from the small-scale fluctuations. In summary, scale-separation refers to a case where a range of intermediate (length/time) scales are absent.

Due to their ubiquity, it is useful to see an illustration of the method applied to a scale-separated PDF. For this purpose, a spectrum was created by generating normally-distributed random numbers (with zero mean) on MATLAB and arranging them in decreasing order of magnitude. The rearrangement of the numbers was done to imitate the spectrum of a system in a statistically steady state, where fluctuations decrease with decreasing scale size. Further, the spectrum was set to zero in some intermediate wavenumbers to mimic scale-separation (Figure 7). Two such PDFs were constructed as different realizations of the random spectrum and the algorithm described in Section 4 was used to analyse their similarity and determine the cut-off scale. Using the same random spectrum, the effect of varying parameters is discussed. (The random spectra used to construct Figure 12 are different from the ones used for Figures 8 - 11.) Integrations were performed numerically and the results are shown in Table 2.

The following observations, similar to those in Section 6.1, can be made:

  1. Comparing the results for Figures 8 and 12, it is seen that the cut-off scale is higher (smaller length scales are probed) in the former. Observing the graphs of the two cases reveals that in Figure 12, the original PDFs are very much “out of phase”, which means they are less similar to begin with than in Figure 8.

  2. Increasing the number of moments increases the cut-off (wavenumber) scale.

  3. Reducing the tolerance reduces the cut-off scale.

7 Extension to higher dimensions

All of the above analysis was done for PDFs that were functions of just one variable. Can it be extended to a multivariate PDF (i.e., to higher dimensions, say )? It is possible, as will be shown below. The steps are similar to the ones in the 1-dimensional case, but there is a subtle difference in the end.

Let and let be a PDF supported over the unit ball in . The characteristic function in dimensions is given by:

where is the wavevector. Let and , where each . Then the moments of the PDF are defined as follows:


where .

The upper bounds for the moments are defined as:

The low-pass filtering operator is defined for a -dimensional cut-off scale :


Expanding the characteristic function in a Taylor series about the origin,


where and .

As in the 1-dimensional case, the remainder term in the Taylor series can be bounded as:

is obtained by reducing any one non-zero component of by . Finally, the distance metric in -dimension is:


Following the same sequence of inequalities as before, the distance inequality for the smoothed functions becomes:

where is the remainder term. Here, and are chosen so that the remainder term is the least possible (among all the moments of that order). Note that .


where .

The difference in the higher dimensional case is that the solution is not unique. However, a unique (and conservative) value for the cut-off scale, that is common over all the dimensions, can be determined. If , then setting all the , the resulting equation is of the form:


where the are the sum of various moments (divided by the appropriate factorials) and the remainder term. The existence and uniqueness of this can be proved just as in the 1-dimensional case.

8 Conclusions

A method has been proposed to estimate a length scale to which given PDFs must be smoothed for their (normalized) distance to be less than a user-specified tolerance, given the first few moments of the PDFs. It has been shown that such a cut-off scale indeed exists and is unique. Two numerical examples were used as proof of concept, as well as to illustrate the working of the algorithm. An extension to higher dimensions has been outlined. This scheme is hoped to find use in data analysis for comparing large data sets, as calculating moments is less computationally-intensive than having to reverse-engineer the PDFs of the data sets in order to compare them.

9 Scope for future work

The calculations in Section 3 involves a series of inequalities, which lead to a very conservative estimate of the distance between the smoothed functions and the cut-off scale, as seen in Tables 1 and 2. A useful direction for future research is to sharpen the estimates with more accurate inequalities.

Moreover, the method described in this article only deals with PDFs having bounded support. As discussed in Section 5.2, unbounded domains pose major problems in the existence or the finiteness of moments. Extending this scheme, or formulating an entirely new one, for unbounded domains could be another meaningful and interesting research problem.

PCJ wishes to thank Andy Sebastian and Andrew Corson for helpful discussions in the preliminary stages of this research and acknowledge Joydeep Singha’s suggestion to extend this algorithm to multiple dimensions. The authors are also grateful to Dr. Chris Jarzynski and Dr. Venkatarathnam Gadhiraju for their valuable suggestions to improve the manuscript.

Appendix A Bounds on higher moments

Consider a PDF with compact support . Then, its moment can be bounded from above by a lower moment and/or other terms, depending on the smoothness of the PDF. This is useful in bounding the remainder term in the algorithm discussed above.

a.1 PDF is bounded:

For all with , we use the Cauchy-Schwartz inequality to obtain:


Three special cases of (19) may be of interest.

  1. is even and

  2. is odd and

a.2 PDF is absolutely continuous

In this case, for some ,


for . This bound may be used in (19).

a.3 are finite and known


This can, of course, be extended to higher derivatives by integrating by parts repeatedly, depending on how much information one already has about the PDF.

Appendix B Smoothness of the characteristic function

Theorem 1.

If a random variable has moments up to the order , then the corresponding characteristic function belongs to .

Proof: (Adapted from [4]) From (2), we know that for a random variable with a PDF given by ,

In the case that a PDF does not exist, this can be rewritten in terms of the cumulative distribution function as:

Thus, if the PDF exists, then . Now, consider

The RHS is independent of , and less than which is twice the moment. Also, the RHS can be made arbitrarily small by taking the limit . From all these observations, we conclude that is uniformly continuous on the entire real line. Thus, .

From the discussion in Section 5.1, we know that a random variable over a compact support has all moments. Combining this with the above theorem gives us the following corollary.

Corollary 2.

For a random variable over a compact support, the characteristic function belongs to .



  • [1] James Alexander Shohat and Jacob David Tamarkin. The problem of moments. American Mathematical Society, revised edition, 1970.
  • [2] G Talenti. Recovering a function from a finite number of moments. Inverse Problems, 3(3):501–517, 8 1987.
  • [3] Uriel Frisch. Turbulence: The Legacy of A.N. Kolmogorov. Cambridge University Press, 1995.
  • [4] Eugene Lukacs. Characteristic Functions. Charles Griffin and Company Limited, 2nd edition, 1970.

Tables and table captions

to 1.0—X[c,m]—X[c,m]—X[c,m]—X[c,m]—X[c,m]—X[c,m]—X[c,m]— Figure number & Number of moments & Tolerance & Cut-off scale & Original distance & Smoothed distance & Parameters of PDFs

Figure 2 & 3 & 0.1 & 1.396927 & 0.561495 & 0.010545 &

Figure 3 & 3 & 0.1 & 1.854235 & 0.289393 & 0.005971 &

Figure 4 & 6 & 0.1 & 1.537326 & 0.561495 & 0.013916 &

Figure 5 & 6 & 0.1 & 2.863850 & 0.289393 & 0.019028 &

Figure 6 & 3 & 0.01 & 0.564254 & 0.561495 & 0.000723 &

Table 1: Effect of various parameters for a normally-distributed PDF over a compact support.

to —X[c,m]—X[c,m]—X[c,m]—X[c,m]—X[c,m]—X[c,m]— Figure number & Number of moments & Tolerance & Cut-off scale & Original distance & Smoothed distance

Figure 8 & 20 & 1 & 6.949457 & 0.048280 & 0.024368

Figure 9 & 10 & 1 & 6.042744 & 0.048280 & 0.057852

Figure 10 & 5 & 1 & 4.178820 & 0.048280 & 0.029120

Figure 11 & 5 & 0.1 & 2.816885 & 0.048280 & 0.009402

Figure 12 & 20 & 1 & 6.364578 & 0.135750 & 0.086696

Table 2: Effect of various parameters for a scale-separated PDF over a compact support.
Figure 1: (a) Test function and action of (b) low-pass and (c) high-pass filters. (Reproduced with permission from Chapter 2 of [3])

to X[c]


Figure 2: Normally-distributed PDFs supported over .
, , Red - , Blue -
(a) Comparison of original and smoothed functions
(b) Smoothed functions shown on different scale for better resolution

to X[c]


Figure 3: Normally-distributed PDFs supported over .
, , Red - , Blue -
(a) Comparison of original and smoothed functions
(b) Smoothed functions shown on different scale for better resolution

to X[c]


Figure 4: Normally-distributed PDFs supported over .
, , Red - , Blue -
(a) Comparison of original and smoothed functions
(b) Smoothed functions shown on different scale for better resolution

to X[c]


Figure 5: Normally-distributed PDFs supported over .
, , Red - , Blue -
(a) Comparison of original and smoothed functions
(b) Smoothed functions shown on different scale for better resolution

to X[c]


Figure 6: Normally-distributed PDFs supported over .
, , Red - , Blue -
(a) Comparison of original and smoothed functions
(b) Smoothed functions shown on different scale for better resolution
Figure 7: Amplitudes of the sine-component of one of the functions in Figures 8 - 11
Figure 8: Scale-separated PDFs constructed from random spectra.
Figure 9: Scale-separated PDFs constructed from random spectra.
Figure 10: Scale-separated PDFs constructed from random spectra.
Figure 11: Scale-separated PDFs constructed from random spectra.
Figure 12: Scale-separated PDFs constructed from random spectra.