The distribution of Yule's 'nonsense correlation'

09/05/2019 ∙ by Philip A. Ernst, et al. ∙ 0

In 2017, the authors of ernst2017yule explicitly computed the second moment of Yule's "nonsense correlation," offering the first mathematical explanation of Yule's 1926 empirical finding of nonsense correlation. yule1926. The present work closes the final longstanding open question on the distribution of Yule's nonsense correlation ρ:= ∫_0^1 W_1(t)W_2(t) dt - ∫_0^1 W_1(t) dt ∫_0^1 W_2(t) dt/√(∫_0^1 W^2_1(t) dt - (∫_0^1W_1(t) dt)^2)√(∫_0^1 W^2_2(t) dt - (∫_0^1W_2(t) dt)^2) by explicitly calculating all moments of ρ (up to order 16) for two independent Wiener processes, W_1, W_2. These lead to an approximation to the density of Yule's nonsense correlation, apparently for the first time. We proceed to explicitly compute higher moments of Yule's nonsense correlation when the two independent Wiener processes are replaced by two correlated Wiener processes, two independent Ornstein-Uhlenbeck processes, and two independent Brownian bridges.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Given a sequence of pairs of random variables

(), how can we measure the strength of the dependence of and ? The classical Pearson correlation coefficient offers a solution that is standard yet often problematic, particularly when it is calculated between two time series. Indeed, one may observe “volatile” correlation in independent time series, where the correlation is volatile in the sense that its distribution is heavily dispersed and is frequently large in absolute value. Yule observed this empirically in his 1926 seminal paper (Yule, 1926), calling it “nonsense” correlation, writing that “we sometimes obtain between quantities varying with time (time-variables) quite high correlations to which we cannot attach any physical significance whatever, although under the ordinary test the correlation would be held to be certainly significant.”
Yule’s finding would remain “isolated” from the literature until 1986 (Aldrich, 1995), when the authors of Hendry (1986) and Phillips (1986) confirmed many of the empirical claims of “spurious regression” made by the authors of Granger and Newbold (1974). In particular, Phillips (1986) provided a mathematical solution to the problem of spurious regression among integrated time series by demonstrating that statistical t-ratio and F-ratio tests diverge with the sample size, thereby explaining the observed ‘statistical significance’ in such regressions. In later work Phillips (1998), the same author provided an explanation of such spurious regressions in terms of orthonormal representations of the Karhunen Loève type.
Let be some process with values in , defined over a fixed time interval . Define random variables


with values in and respectively, where is the space of real matrices. In 2017, Ernst et al. (2017) explicitly calculated the second moment of Yule’s nonsense correlation, defined as


in the case , when is a two-dimensional Wiener process444Scaling properties of Brownian motion show that the law of does not depend on the choice of .. This calculation provided the first mathematical explanation of Yule’s 1926 empirical finding. The present work closes the final long-standing open question on the distribution of Yule’s nonsense correlation by determining the law of and explicitly calculating all moments (up to order 16). With these moments in hand, we provide the first density approximation to Yule’s nonsense correlation. We then proceed to explicitly compute higher moments of Yule’s nonsense correlation when the two independent Wiener processes are replaced by two correlated Wiener processes, two independent Ornstein-Uhlenbeck processes, and two independent Brownian bridges. This closes all previously open problems raised in Section 3.3 of Ernst et al. (2017).

Our method for explicitly calculating all moments (up to order 16) of Yule’s nonsense correlation (Section 3

), relies on the characterization of the moment generating function of the random vector

. This approach inherits from an older and well-developed literature, on the laws of quadratic functionals of Brownian motion. There is a fine survey (Donati-Martin and Yor, 1997) which presents the state of the subject as it was in 1997, and as it has substantially remained since then. A range of techniques is available to characterize the laws of quadratic functionals of Brownian motion, including:

  1. eigenfunction expansions — see, for example, Lévy (1951), Fixman (1962), Mac aonghusa and Pule (1989), Chan (1991), Chan et al. (1994), Ernst et al. (2017);

  2. identifying the covariance of the Gaussian process as the Green function of a symmetrizable Markov process — see, for example, Chan et al. (1994), Dynkin (1980);

  3. stochastic Fubini relations — see, for example, Donati-Martin and Yor (1997);

  4. Itô’s formula — see Rogers and Shi (1992).

The first of these techniques is historically the first; using it to deliver a simple closed-form solution depends on spotting a simpler form for an infinite expansion. The second works well if we can see a Markov process whose Green function is the covariance of the Gaussian process of interest. The third again requires an insight to transform the problem of interest into a simpler equivalent. The fourth, not so often exploited, deals conclusively with settings where the Gaussian process arises as the solution of a linear stochastic differential equation (SDE); this is the approach we use in this paper. It has the advantage that no clever insight is required — it is a mechanical calculation, as we shall see when we extend the Brownian motion result to correlated Brownian motions, Ornstein-Uhlenbeck processes and the Brownian bridge; we make the obvious changes to the ODEs to be solved and that is all there is.

2 Quadratic functionals of Gaussian diffusions.

We shall use the notation for the space of strictly positive-definite symmetric matrices, with the canonical ordering meaning that is non-negative definite. The main result is the following.

Theorem 1.

Suppose that is a bounded measurable function, and that solves555For notational simplicity, we will often omit the independent variable .


where is -dimensional Brownian motion. We write .

Suppose that and are bounded measurable functions such that is also bounded. Define


Then is given explicitly as



are obtained as the unique solutions to the system of ordinary differential equations (ODEs),

666We use an “overdot” to denote the derivative with respect to .


subject to the boundary conditions .


(i) Notice that is bounded below by , which by hypothesis is bounded below by some constant, therefore defined by (5) is bounded.

(ii) The ODE (7) has a unique solution up to possible explosion, as the coefficients are locally Lipschitz. We claim that this solution remains positive-definite for . Since , it has to be that there exists some such that for all . If does not remain positive definite, then there exists some non-zero and a greatest such that . But we see from (7) that , contradicting the definition of . Hence remains positive-definite all the way back to possible explosion. However, we have that

So by hypothesis is bounded above and no explosion happens. Since is continuous on and positive-definite everywhere, it follows that is uniformly positive-definite on , that is, remains bounded.

It now follows easily that and defined by (8) and (9) are unique, continuous and bounded.

(iii) Now define the process


and develop

Now consider the process


Notice that is bounded, because is bounded below, and so is since we have proved that , and are all bounded on . Developing using Itô’s formula, with the symbol denoting that the two sides of the equation differ by a local martingale and omitting explicit appearance of the time parameter, we obtain

because of (7), (8) and (9). Thus is a local martingale, which is also bounded on so is a bounded martingale, and the result follows.

Theorem 1 extends easily to the situation where is the solution of a linear SDE.

Theorem 2.

Suppose that and are bounded measurable functions, and that solves


Suppose that and are bounded measurable functions such that is also bounded, and suppose that and are defined as before at (4), (5).

Then is given explicitly as


where are obtained as the unique solutions to the system of ordinary differential equations (ODEs),777We use an “overdot” to denote the derivative with respect to .


subject to the boundary conditions .


The coefficients of the SDE (12) are globally Lipschitz, so it is a standard result (see, for example, Rogers and Williams (2000) Theorem V.11.2) that the SDE has a unique strong solution. If we now set


where and solve


then a few simple calculations show that

and Theorem 1 applies. The equations (14), (15) and (16) are easily checked to be the analogs of (7), (8) and (9) respectively.

Remark 1.

We will want to apply Theorem 1 to situations where . This is a simple limiting case of the problem where we take and let . In a little more detail, we let denote the solution to (14)-(16) with boundary condition , and we write


for the quadratic form . Evidently is decreasing in for each and , and from this it follows easily that limits of exist for each and determine for the limit case when .

Remark 2.

Theorem 2 is a special case of the Feynman-Kac formula; the fact that the process defined in (11) is a martingale is equivalent to the Feynman-Kac formula, and is valid for any additive functional of the diffusion . However, without the special linear form of the SDE for and the quadratic form of the additive functional it is rare that any explicit solution can be found for .

Remark 3.

If is constant, we may assume that

, the identity matrix. To see this, let

, and note that the diffusion process solves the linear SDE,

Letting and we obtain

and thus we can work with the process instead of . However, it seems simpler to provide the full form of the solution for the SDE (12) rather than a reduced form which then requires a translation back to the original problem.

Remark 4.

Although Theorem 2 deals with the general case where are measurable functions, in the remainder of this paper we only need invoke Theorem 2 for the special case in which and are constants. For this reason, we will sometimes use the alternative expanded notation


when we want to make explicit the dependence of on the coefficients and appearing in .

3 Computing the moments of

Henceforth, we deal exclusively with cases where

Recall the definition (10) of the random matrix . Let

be the moment generating function of the joint distribution of

, which can be expressed using quadratic functionals of as


Here, is a positive-definite symmetric matrix with entries denoted by As we shall show in the following proposition, the function is all we shall need to evaluate the moments of .

Proposition 1.

Let be as given in (2) and be as given in (22). For , we have


It is well known that the moments of a random variable can be obtained by differentiating the moment generating function, given it exists (Billingsley, 2008). Now note that for any fixed nonnegative , there exists such that is positive semi-definite for any and thus . Hence, the partial derivative with respect to exists at . Applying Fubini’s Theorem we obtain

Next, recall that by the definition of Gamma function, for any ,

Since , we can apply the above formula to obtain (23) (by Tonelli’s Theorem, the order of integration can always be exchanged). ∎

So we see that the distribution of is determined by (22), from which moments can in principle be derived using Proposition 1; but we need to get hold of the expression (22). This is where Theorem 2 comes in. If is a solution of a linear SDE (12), starting at to fix the discussion, and we set

then Theorem 2 tells us how to compute


where we have written and to emphasize dependence on . If we now integrate over with a distribution the right-hand side of (24) becomes


The strategy now should be clear. In any particular application, we use Theorem 2 to obtain as explicitly as possible, and then we integrate (25) over to find .

4 Examples.

In this Section we will carry out the program just outlined in four examples, and obtain remarkably explicit expressions for everything we need.

In the first three examples, the two-dimensional diffusion process has two special properties:

  • The law of is the same as the law of for any fixed rotation matrix ;

  • The two components of are independent.

Consequently, if we abbreviate , , and define


it follows that the function defined at (22) simplifies to the product



are the eigenvalues of

. This observation simplifies the solution of the differential equations (14)-(16) considerably, reducing everything to a one-dimensional problem.

The final example, that of correlated Brownian motion, reduces to the Brownian example by linear transformation.

4.1 Brownian motion.

For a standard one-dimensional Brownian motion , consider the function where and . By Theorem 2, the solution has the following form (the subscript “” is Brownian motion)

which leads to the following system of ordinary differential equations

Using the boundary condition , we obtain

where . Using the condition , one can show that the solution for is

Solving the third ODE, we obtain

and thus

As at (26), we now mix this expression over to discover that in this example the function (defined at (27)) takes the simple explicit form


From (28) therefore, the moment generating function is given by


where are the eigenvalues of . These eigenvalues are given in terms of the entries of as


where is the -th entry of .

Consider for . Note that for any , the expectation always exists since

. Further, all the odd moments, i.e.

, are zero by symmetry. To compute an even moment of , we apply formula (23). For example, consider the second moment. Straightforward but tedious calculations yield

where we have applied a change of variables, . Note that this is exactly the same as the formula provided in Ernst et al. (2017, Proposition 3.4).

For higher-order moments, the calculation of is extremely laborious. We use Mathematica to perform symbolic high-order differentiation and then the two-dimensional numerical integration. The numerical results are summarized in Table 1. The choice of is irrelevant since the distribution of does not depend on .

2 4 6 8
0.240523 0.109177 0.060862 0.037788
10 12 14 16
0.025114 0.017504 0.012641 0.009385
Table 1: Numerical values of the moments of Yule’s nonsense correlation for two independent Wiener processes ().
Figure 1:

The 12th-order polynomial approximation to the probability density function of

for two independent Wiener processes.

We proceed to use the numerical values of to approximate the probability density function of . For example, we may consider a polynomial approximation

The coefficients can be computed by matching the first moments of (including the zero moment which is always equal to ). This is also known as the Legendre polynomial approximation to the density function (Provost, 2005). The 4th-order, 6th-order and 8th-order polynomial approximations are provided below.

The 12th-order approximation is drawn in Figure 1 above, which looks almost the same as the 8th-order one (not shown here). The above expressions constitute the first density approximations to Yule’s nonsense correlation. It can be seen that the distribution of is dispersed: the density remains approximately constant for .

We have only provided the numerical values of up to order 16. This has been done for two reasons. Firstly, for practical purposes such density approximation, moments of even higher orders are of much less interest. Secondly, the calculations of the derivative and the double integral become extremely slow and require massive memory.

4.2 Ornstein-Uhlenbeck process.

Consider a one-dimensional Ornstein-Uhlenbeck (OU) process which starts from and evolves according to the following stochastic differential equation:


By Theorem 2, the solution has the form

which can be obtained by solving the following system of ODEs

Using , we solve the first equation to obtain

where The second differential equation is first-order linear, so can be solved explicitly; after some routine calculations we obtain

Finally, solving the last differential equation yields

Mixing over with a Gaussian law as before, and using , we obtain

If we have two independent Ornstein-Uhlenbeck processes which both start at zero and have common mean reversion parameter , one can check that an orthogonal transformation of leaves the joint distribution invariant. Indeed, the new two-dimensional process follows exactly the same SDE. Hence, the moment generating function in this case can be computed by

where are the eigenvalues of .

0.1 0.2 0.3 0.4 0.5 1
0.23209 0.22438 0.21734 0.21091 0.20504 0.18231
2 5 10 20 50 100
0.15583 0.11454 0.07627 0.04404 0.01907 0.00971
Table 2: Numerical values of the second moment of Yule’s nonsense correlation for two independent Ornstein-Uhlenbeck processes with mean reversion parameter ().

In Table 2 above we give the numerical values of for independent Ornstein-Uhlenbeck processes with mean reversion parameter (). Note that as , the processes converge to constant zero and thus

(the variance of

) goes to zero. Our numerical results show that decreases slowly.

4.3 Brownian bridge.

For a more complicated example, consider a standard Brownian bridge (denoted by “”) which satisfies . In this case, we must fix and let . The dynamics of can be described by (see for example Rogers and Williams (2000) Theorem IV.40.3)

Though this SDE has the linear form, the drift coefficient explodes at . Hence, it does not satisfy the conditions required in Theorem 2. However, the singularity can easily be isolated, by freezing everything at and applying Theorem 2 to that; we can then let and we find the instances of the ODEs (14)-(16) to be

Solving the first differential equation with yields

One can check that . Similarly, the solution to the second ODE is given by

Though at first sight this might appear to have a singularity at it is in fact analytic. The solution to the third differential equation is given by

One can also check that . Using this, we have


Mixing over gives the one-dimensional generating function

As in the case case of Ornstein-Uhlenbeck processes, the moment generating function is

2 4 6 8
0.149001 0.047864 0.0201829 0.009876
Table 3: Numerical values of the moments of Yule’s nonsense correlation for two independent Brownian bridges.

In Table 3 we provide the moments of for independent Brownian bridges. Comparing with Table 1, we can see that has smaller variance for two Brownian bridges. Intuitively, this is because Brownian bridges are forced to fluctuate around zero more frequently than Brownian motions: a Brownian bridge has to return to zero at but a Brownian motion is likely to make long excursions away from zero.

4.4 Correlated Brownian motion.

Let be two Brownian motions with constant correlation , represented by the following SDE

To compute the moment generating function , we take the approach outlined in Remark 3. Define a matrix as

Then the process is a two-dimensional Brownian motion with independent coordinates. The inverse of is

We now transform the problem to the uncorrelated case by

where we use “” to indicate that is a correlated two-dimensional Brownian motion. The solution may be expressed as


where are the eigenvalues of the matrix . Routine calculation yields

In Table 4 we give the first and second moments of for two-dimensional correlated Brownian motion with correlation coefficient . Observe that is always slightly smaller than if . The variance of , computed as , is decreasing (as

increases) but very slowly. Indeed, the standard deviation of

is for , for and for .

0 0.1 0.2 0.3 0.4
0 0.08873 0.17792 0.26804 0.35963
0.24052 0.24550 0.26061 0.28636 0.32368
0.2405 0.2376 0.2290 0.2145 0.1943
0.5 0.6 0.7 0.8 0.9
0.45338 0.55004 0.65071 0.75698 0.87151
0.37407 0.43986 0.52477 0.63509 0.78298
0.1685 0.1373 0.1013 0.0621 0.0235
Table 4: Numerical values of the moments of Yule’s nonsense correlation for two correlated Brownian motions with correlation coefficient ().


  • Aldrich (1995) Aldrich, J. (1995). Correlations genuine and spurious in Pearson and Yule. Statistical Science, 10(4): 364–376.
  • Billingsley (2008) Billingsley, P. (2008). Probability and Measure. John Wiley & Sons.
  • Chan (1991) Chan, T. (1991). Indefinite quadratic functionals of gaussian processes and least-action paths. Annales de l’IHP Probabilités et Statistiques, 27(2):239–271.
  • Chan et al. (1994) Chan, T., Dean, D. S., Jansons, K. M., and Rogers, L. C. G. (1994). On polymer conformations in elongational flows. Communications in Mathematical Physics, 160(2):239–257.
  • Donati-Martin and Yor (1997) Donati-Martin, C. and Yor, M. (1997). Some Brownian functionals and their laws. The Annals of Probability, 25(3):1011–1058.
  • Dynkin (1980) Dynkin, E. B. (1980). Markov processes and random fields. Bulletin of the American Mathematical Society, 3(3):975–999.
  • Ernst et al. (2017) Ernst, P. A., Shepp, L. A., and Wyner, A. J. (2017). Yule’s “nonsense correlation“ solved! The Annals of Statistics, 45(4):1789–1809.
  • Fixman (1962) Fixman, M. (1962). Radius of gyration of polymer chains. The Journal of Chemical Physics, 36(2):306–310.
  • Granger and Newbold (1974) Granger, C. and Newbold, D. (1974). Spurious regression in econometrics. Journal of Econometrics, 2: 111–120.
  • Hendry (1986) Hendry, D. (1986). Economic modelling with cointegrated variables: an overview. Oxford Bulletin of Economics and Statistics, 48(3): 201–212.
  • Lévy (1951) Lévy, P. (1951). Wiener’s random function, and other laplacian random functions. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. The Regents of the University of California.
  • Mac aonghusa and Pule (1989) Mac aonghusa, P. and Pule, J. V. (1989). An extension of Lévy’s stochastic area formula. Stochastics: An International Journal of Probability and Stochastic Processes, 26(4):247–255.
  • Phillips (1986) Phillips, P. C. B. (1986). Understanding spurious regressions in econometrics. Journal of Econometrics, 33(3): 311–340.
  • Phillips (1998) Phillips, P. C. B. (1998). New tools for understanding spurious regressions. Econometrica, 66(6):1299–1325.
  • Provost (2005) Provost, S. B. (2005). Moment-based density approximants. Mathematica Journal, 9(4):727–756.
  • Rogers and Shi (1992) Rogers, L. C. G. and Shi, Z. (1992). Quadratic functionals of Brownian motion, optimal control, and the “Colditz” example. Stochastics: An International Journal of Probability and Stochastic Processes, 41(4):201–218.
  • Rogers and Williams (2000) Rogers, L. C. G. and Williams, D. (2000). Diffusions, Markov Processes and Martingales: Volume 2, Itô Calculus. Cambridge University Press.
  • Yule (1926) Yule, G. U. (1926). Why do we sometimes get nonsense-correlations between time-series?–a study in sampling and the nature of time-series. Journal of the Royal Statistical Society, 89(1):1–63.