Given a sequence of pairs of random variables(), how can we measure the strength of the dependence of and ? The classical Pearson correlation coefficient offers a solution that is standard yet often problematic, particularly when it is calculated between two time series. Indeed, one may observe “volatile” correlation in independent time series, where the correlation is volatile in the sense that its distribution is heavily dispersed and is frequently large in absolute value. Yule observed this empirically in his 1926 seminal paper (Yule, 1926), calling it “nonsense” correlation, writing that “we sometimes obtain between quantities varying with time (time-variables) quite high correlations to which we cannot attach any physical significance whatever, although under the ordinary test the correlation would be held to be certainly significant.”
Yule’s finding would remain “isolated” from the literature until 1986 (Aldrich, 1995), when the authors of Hendry (1986) and Phillips (1986) confirmed many of the empirical claims of “spurious regression” made by the authors of Granger and Newbold (1974). In particular, Phillips (1986) provided a mathematical solution to the problem of spurious regression among integrated time series by demonstrating that statistical t-ratio and F-ratio tests diverge with the sample size, thereby explaining the observed ‘statistical significance’ in such regressions. In later work Phillips (1998), the same author provided an explanation of such spurious regressions in terms of orthonormal representations of the Karhunen Loève type.
Let be some process with values in , defined over a fixed time interval . Define random variables
with values in and respectively, where is the space of real matrices. In 2017, Ernst et al. (2017) explicitly calculated the second moment of Yule’s nonsense correlation, defined as
in the case , when is a two-dimensional Wiener process444Scaling properties of Brownian motion show that the law of does not depend on the choice of .. This calculation provided the first mathematical explanation of Yule’s 1926 empirical finding. The present work closes the final long-standing open question on the distribution of Yule’s nonsense correlation by determining the law of and explicitly calculating all moments (up to order 16). With these moments in hand, we provide the first density approximation to Yule’s nonsense correlation. We then proceed to explicitly compute higher moments of Yule’s nonsense correlation when the two independent Wiener processes are replaced by two correlated Wiener processes, two independent Ornstein-Uhlenbeck processes, and two independent Brownian bridges. This closes all previously open problems raised in Section 3.3 of Ernst et al. (2017).
Our method for explicitly calculating all moments (up to order 16) of Yule’s nonsense correlation (Section 3. This approach inherits from an older and well-developed literature, on the laws of quadratic functionals of Brownian motion. There is a fine survey (Donati-Martin and Yor, 1997) which presents the state of the subject as it was in 1997, and as it has substantially remained since then. A range of techniques is available to characterize the laws of quadratic functionals of Brownian motion, including:
stochastic Fubini relations — see, for example, Donati-Martin and Yor (1997);
Itô’s formula — see Rogers and Shi (1992).
The first of these techniques is historically the first; using it to deliver a simple closed-form solution depends on spotting a simpler form for an infinite expansion. The second works well if we can see a Markov process whose Green function is the covariance of the Gaussian process of interest. The third again requires an insight to transform the problem of interest into a simpler equivalent. The fourth, not so often exploited, deals conclusively with settings where the Gaussian process arises as the solution of a linear stochastic differential equation (SDE); this is the approach we use in this paper. It has the advantage that no clever insight is required — it is a mechanical calculation, as we shall see when we extend the Brownian motion result to correlated Brownian motions, Ornstein-Uhlenbeck processes and the Brownian bridge; we make the obvious changes to the ODEs to be solved and that is all there is.
2 Quadratic functionals of Gaussian diffusions.
We shall use the notation for the space of strictly positive-definite symmetric matrices, with the canonical ordering meaning that is non-negative definite. The main result is the following.
Suppose that is a bounded measurable function, and that solves555For notational simplicity, we will often omit the independent variable .
where is -dimensional Brownian motion. We write .
Suppose that and are bounded measurable functions such that is also bounded. Define
Then is given explicitly as
where are obtained as the unique solutions to the system of ordinary differential equations (ODEs),
are obtained as the unique solutions to the system of ordinary differential equations (ODEs),666We use an “overdot” to denote the derivative with respect to .
subject to the boundary conditions .
(i) Notice that is bounded below by , which by hypothesis is bounded below by some constant, therefore defined by (5) is bounded.
(ii) The ODE (7) has a unique solution up to possible explosion, as the coefficients are locally Lipschitz. We claim that this solution remains positive-definite for . Since , it has to be that there exists some such that for all . If does not remain positive definite, then there exists some non-zero and a greatest such that . But we see from (7) that , contradicting the definition of . Hence remains positive-definite all the way back to possible explosion. However, we have that
So by hypothesis is bounded above and no explosion happens. Since is continuous on and positive-definite everywhere, it follows that is uniformly positive-definite on , that is, remains bounded.
(iii) Now define the process
Now consider the process
Notice that is bounded, because is bounded below, and so is since we have proved that , and are all bounded on . Developing using Itô’s formula, with the symbol denoting that the two sides of the equation differ by a local martingale and omitting explicit appearance of the time parameter, we obtain
Theorem 1 extends easily to the situation where is the solution of a linear SDE.
Suppose that and are bounded measurable functions, and that solves
Then is given explicitly as
where are obtained as the unique solutions to the system of ordinary differential equations (ODEs),777We use an “overdot” to denote the derivative with respect to .
subject to the boundary conditions .
The coefficients of the SDE (12) are globally Lipschitz, so it is a standard result (see, for example, Rogers and Williams (2000) Theorem V.11.2) that the SDE has a unique strong solution. If we now set
where and solve
then a few simple calculations show that
We will want to apply Theorem 1 to situations where . This is a simple limiting case of the problem where we take and let . In a little more detail, we let denote the solution to (14)-(16) with boundary condition , and we write
for the quadratic form . Evidently is decreasing in for each and , and from this it follows easily that limits of exist for each and determine for the limit case when .
Theorem 2 is a special case of the Feynman-Kac formula; the fact that the process defined in (11) is a martingale is equivalent to the Feynman-Kac formula, and is valid for any additive functional of the diffusion . However, without the special linear form of the SDE for and the quadratic form of the additive functional it is rare that any explicit solution can be found for .
If is constant, we may assume that , the identity matrix.
To see this, let
, the identity matrix. To see this, let, and note that the diffusion process solves the linear SDE,
Letting and we obtain
and thus we can work with the process instead of . However, it seems simpler to provide the full form of the solution for the SDE (12) rather than a reduced form which then requires a translation back to the original problem.
Although Theorem 2 deals with the general case where are measurable functions, in the remainder of this paper we only need invoke Theorem 2 for the special case in which and are constants. For this reason, we will sometimes use the alternative expanded notation
when we want to make explicit the dependence of on the coefficients and appearing in .
3 Computing the moments of
Henceforth, we deal exclusively with cases where
be the moment generating function of the joint distribution of, which can be expressed using quadratic functionals of as
Here, is a positive-definite symmetric matrix with entries denoted by As we shall show in the following proposition, the function is all we shall need to evaluate the moments of .
It is well known that the moments of a random variable can be obtained by differentiating the moment generating function, given it exists (Billingsley, 2008). Now note that for any fixed nonnegative , there exists such that is positive semi-definite for any and thus . Hence, the partial derivative with respect to exists at . Applying Fubini’s Theorem we obtain
Next, recall that by the definition of Gamma function, for any ,
Since , we can apply the above formula to obtain (23) (by Tonelli’s Theorem, the order of integration can always be exchanged). ∎
So we see that the distribution of is determined by (22), from which moments can in principle be derived using Proposition 1; but we need to get hold of the expression (22). This is where Theorem 2 comes in. If is a solution of a linear SDE (12), starting at to fix the discussion, and we set
then Theorem 2 tells us how to compute
where we have written and to emphasize dependence on . If we now integrate over with a distribution the right-hand side of (24) becomes
In this Section we will carry out the program just outlined in four examples, and obtain remarkably explicit expressions for everything we need.
In the first three examples, the two-dimensional diffusion process has two special properties:
The law of is the same as the law of for any fixed rotation matrix ;
The two components of are independent.
Consequently, if we abbreviate , , and define
it follows that the function defined at (22) simplifies to the product
are the eigenvalues of. This observation simplifies the solution of the differential equations (14)-(16) considerably, reducing everything to a one-dimensional problem.
The final example, that of correlated Brownian motion, reduces to the Brownian example by linear transformation.
4.1 Brownian motion.
For a standard one-dimensional Brownian motion , consider the function where and . By Theorem 2, the solution has the following form (the subscript “” is Brownian motion)
which leads to the following system of ordinary differential equations
Using the boundary condition , we obtain
where . Using the condition , one can show that the solution for is
Solving the third ODE, we obtain
From (28) therefore, the moment generating function is given by
where are the eigenvalues of . These eigenvalues are given in terms of the entries of as
where is the -th entry of .
Consider for . Note that for any , the expectation always exists since
. Further, all the odd moments, i.e., are zero by symmetry. To compute an even moment of , we apply formula (23). For example, consider the second moment. Straightforward but tedious calculations yield
where we have applied a change of variables, . Note that this is exactly the same as the formula provided in Ernst et al. (2017, Proposition 3.4).
For higher-order moments, the calculation of is extremely laborious. We use Mathematica to perform symbolic high-order differentiation and then the two-dimensional numerical integration. The numerical results are summarized in Table 1. The choice of is irrelevant since the distribution of does not depend on .
We proceed to use the numerical values of to approximate the probability density function of . For example, we may consider a polynomial approximation
The coefficients can be computed by matching the first moments of (including the zero moment which is always equal to ). This is also known as the Legendre polynomial approximation to the density function (Provost, 2005). The 4th-order, 6th-order and 8th-order polynomial approximations are provided below.
The 12th-order approximation is drawn in Figure 1 above, which looks almost the same as the 8th-order one (not shown here). The above expressions constitute the first density approximations to Yule’s nonsense correlation. It can be seen that the distribution of is dispersed: the density remains approximately constant for .
We have only provided the numerical values of up to order 16. This has been done for two reasons. Firstly, for practical purposes such density approximation, moments of even higher orders are of much less interest. Secondly, the calculations of the derivative and the double integral become extremely slow and require massive memory.
4.2 Ornstein-Uhlenbeck process.
Consider a one-dimensional Ornstein-Uhlenbeck (OU) process which starts from and evolves according to the following stochastic differential equation:
By Theorem 2, the solution has the form
which can be obtained by solving the following system of ODEs
Using , we solve the first equation to obtain
where The second differential equation is first-order linear, so can be solved explicitly; after some routine calculations we obtain
Finally, solving the last differential equation yields
Mixing over with a Gaussian law as before, and using , we obtain
If we have two independent Ornstein-Uhlenbeck processes which both start at zero and have common mean reversion parameter , one can check that an orthogonal transformation of leaves the joint distribution invariant. Indeed, the new two-dimensional process follows exactly the same SDE. Hence, the moment generating function in this case can be computed by
where are the eigenvalues of .
4.3 Brownian bridge.
For a more complicated example, consider a standard Brownian bridge (denoted by “”) which satisfies . In this case, we must fix and let . The dynamics of can be described by (see for example Rogers and Williams (2000) Theorem IV.40.3)
Though this SDE has the linear form, the drift coefficient explodes at . Hence, it does not satisfy the conditions required in Theorem 2. However, the singularity can easily be isolated, by freezing everything at and applying Theorem 2 to that; we can then let and we find the instances of the ODEs (14)-(16) to be
Solving the first differential equation with yields
One can check that . Similarly, the solution to the second ODE is given by
Though at first sight this might appear to have a singularity at it is in fact analytic. The solution to the third differential equation is given by
One can also check that . Using this, we have
Mixing over gives the one-dimensional generating function
As in the case case of Ornstein-Uhlenbeck processes, the moment generating function is
In Table 3 we provide the moments of for independent Brownian bridges. Comparing with Table 1, we can see that has smaller variance for two Brownian bridges. Intuitively, this is because Brownian bridges are forced to fluctuate around zero more frequently than Brownian motions: a Brownian bridge has to return to zero at but a Brownian motion is likely to make long excursions away from zero.
4.4 Correlated Brownian motion.
Let be two Brownian motions with constant correlation , represented by the following SDE
To compute the moment generating function , we take the approach outlined in Remark 3. Define a matrix as
Then the process is a two-dimensional Brownian motion with independent coordinates. The inverse of is
We now transform the problem to the uncorrelated case by
where we use “” to indicate that is a correlated two-dimensional Brownian motion. The solution may be expressed as
where are the eigenvalues of the matrix . Routine calculation yields
In Table 4 we give the first and second moments of for two-dimensional correlated Brownian motion with correlation coefficient . Observe that is always slightly smaller than if . The variance of , computed as , is decreasing (as
increases) but very slowly. Indeed, the standard deviation ofis for , for and for .
- Aldrich (1995) Aldrich, J. (1995). Correlations genuine and spurious in Pearson and Yule. Statistical Science, 10(4): 364–376.
- Billingsley (2008) Billingsley, P. (2008). Probability and Measure. John Wiley & Sons.
- Chan (1991) Chan, T. (1991). Indefinite quadratic functionals of gaussian processes and least-action paths. Annales de l’IHP Probabilités et Statistiques, 27(2):239–271.
- Chan et al. (1994) Chan, T., Dean, D. S., Jansons, K. M., and Rogers, L. C. G. (1994). On polymer conformations in elongational flows. Communications in Mathematical Physics, 160(2):239–257.
- Donati-Martin and Yor (1997) Donati-Martin, C. and Yor, M. (1997). Some Brownian functionals and their laws. The Annals of Probability, 25(3):1011–1058.
- Dynkin (1980) Dynkin, E. B. (1980). Markov processes and random fields. Bulletin of the American Mathematical Society, 3(3):975–999.
- Ernst et al. (2017) Ernst, P. A., Shepp, L. A., and Wyner, A. J. (2017). Yule’s “nonsense correlation“ solved! The Annals of Statistics, 45(4):1789–1809.
- Fixman (1962) Fixman, M. (1962). Radius of gyration of polymer chains. The Journal of Chemical Physics, 36(2):306–310.
- Granger and Newbold (1974) Granger, C. and Newbold, D. (1974). Spurious regression in econometrics. Journal of Econometrics, 2: 111–120.
- Hendry (1986) Hendry, D. (1986). Economic modelling with cointegrated variables: an overview. Oxford Bulletin of Economics and Statistics, 48(3): 201–212.
- Lévy (1951) Lévy, P. (1951). Wiener’s random function, and other laplacian random functions. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. The Regents of the University of California.
- Mac aonghusa and Pule (1989) Mac aonghusa, P. and Pule, J. V. (1989). An extension of Lévy’s stochastic area formula. Stochastics: An International Journal of Probability and Stochastic Processes, 26(4):247–255.
- Phillips (1986) Phillips, P. C. B. (1986). Understanding spurious regressions in econometrics. Journal of Econometrics, 33(3): 311–340.
- Phillips (1998) Phillips, P. C. B. (1998). New tools for understanding spurious regressions. Econometrica, 66(6):1299–1325.
- Provost (2005) Provost, S. B. (2005). Moment-based density approximants. Mathematica Journal, 9(4):727–756.
- Rogers and Shi (1992) Rogers, L. C. G. and Shi, Z. (1992). Quadratic functionals of Brownian motion, optimal control, and the “Colditz” example. Stochastics: An International Journal of Probability and Stochastic Processes, 41(4):201–218.
- Rogers and Williams (2000) Rogers, L. C. G. and Williams, D. (2000). Diffusions, Markov Processes and Martingales: Volume 2, Itô Calculus. Cambridge University Press.
- Yule (1926) Yule, G. U. (1926). Why do we sometimes get nonsense-correlations between time-series?–a study in sampling and the nature of time-series. Journal of the Royal Statistical Society, 89(1):1–63.