1 Introduction
Given a sequence of pairs of random variables
(), how can we measure the strength of the dependence of and ? The classical Pearson correlation coefficient offers a solution that is standard yet often problematic, particularly when it is calculated between two time series. Indeed, one may observe “volatile” correlation in independent time series, where the correlation is volatile in the sense that its distribution is heavily dispersed and is frequently large in absolute value. Yule observed this empirically in his 1926 seminal paper (Yule, 1926), calling it “nonsense” correlation, writing that “we sometimes obtain between quantities varying with time (timevariables) quite high correlations to which we cannot attach any physical significance whatever, although under the ordinary test the correlation would be held to be certainly significant.”Yule’s finding would remain “isolated” from the literature until 1986 (Aldrich, 1995), when the authors of Hendry (1986) and Phillips (1986) confirmed many of the empirical claims of “spurious regression” made by the authors of Granger and Newbold (1974). In particular, Phillips (1986) provided a mathematical solution to the problem of spurious regression among integrated time series by demonstrating that statistical tratio and Fratio tests diverge with the sample size, thereby explaining the observed ‘statistical significance’ in such regressions. In later work Phillips (1998), the same author provided an explanation of such spurious regressions in terms of orthonormal representations of the Karhunen Loève type.
Let be some process with values in , defined over a fixed time interval . Define random variables
(1) 
with values in and respectively, where is the space of real matrices. In 2017, Ernst et al. (2017) explicitly calculated the second moment of Yule’s nonsense correlation, defined as
(2) 
in the case , when is a twodimensional Wiener process^{4}^{4}4Scaling properties of Brownian motion show that the law of does not depend on the choice of .. This calculation provided the first mathematical explanation of Yule’s 1926 empirical finding. The present work closes the final longstanding open question on the distribution of Yule’s nonsense correlation by determining the law of and explicitly calculating all moments (up to order 16). With these moments in hand, we provide the first density approximation to Yule’s nonsense correlation. We then proceed to explicitly compute higher moments of Yule’s nonsense correlation when the two independent Wiener processes are replaced by two correlated Wiener processes, two independent OrnsteinUhlenbeck processes, and two independent Brownian bridges. This closes all previously open problems raised in Section 3.3 of Ernst et al. (2017).
Our method for explicitly calculating all moments (up to order 16) of Yule’s nonsense correlation (Section 3
), relies on the characterization of the moment generating function of the random vector
. This approach inherits from an older and welldeveloped literature, on the laws of quadratic functionals of Brownian motion. There is a fine survey (DonatiMartin and Yor, 1997) which presents the state of the subject as it was in 1997, and as it has substantially remained since then. A range of techniques is available to characterize the laws of quadratic functionals of Brownian motion, including:
stochastic Fubini relations — see, for example, DonatiMartin and Yor (1997);

Itô’s formula — see Rogers and Shi (1992).
The first of these techniques is historically the first; using it to deliver a simple closedform solution depends on spotting a simpler form for an infinite expansion. The second works well if we can see a Markov process whose Green function is the covariance of the Gaussian process of interest. The third again requires an insight to transform the problem of interest into a simpler equivalent. The fourth, not so often exploited, deals conclusively with settings where the Gaussian process arises as the solution of a linear stochastic differential equation (SDE); this is the approach we use in this paper. It has the advantage that no clever insight is required — it is a mechanical calculation, as we shall see when we extend the Brownian motion result to correlated Brownian motions, OrnsteinUhlenbeck processes and the Brownian bridge; we make the obvious changes to the ODEs to be solved and that is all there is.
2 Quadratic functionals of Gaussian diffusions.
We shall use the notation for the space of strictly positivedefinite symmetric matrices, with the canonical ordering meaning that is nonnegative definite. The main result is the following.
Theorem 1.
Suppose that is a bounded measurable function, and that solves^{5}^{5}5For notational simplicity, we will often omit the independent variable .
(3) 
where is dimensional Brownian motion. We write .
Suppose that and are bounded measurable functions such that is also bounded. Define
(4)  
(5) 
Then is given explicitly as
(6) 
where
are obtained as the unique solutions to the system of ordinary differential equations (ODEs),
^{6}^{6}6We use an “overdot” to denote the derivative with respect to .(7)  
(8)  
(9) 
subject to the boundary conditions .
Proof.
(i) Notice that is bounded below by , which by hypothesis is bounded below by some constant, therefore defined by (5) is bounded.
(ii) The ODE (7) has a unique solution up to possible explosion, as the coefficients are locally Lipschitz. We claim that this solution remains positivedefinite for . Since , it has to be that there exists some such that for all . If does not remain positive definite, then there exists some nonzero and a greatest such that . But we see from (7) that , contradicting the definition of . Hence remains positivedefinite all the way back to possible explosion. However, we have that
So by hypothesis is bounded above and no explosion happens. Since is continuous on and positivedefinite everywhere, it follows that is uniformly positivedefinite on , that is, remains bounded.
(iii) Now define the process
(10) 
and develop
Now consider the process
(11) 
Notice that is bounded, because is bounded below, and so is since we have proved that , and are all bounded on . Developing using Itô’s formula, with the symbol denoting that the two sides of the equation differ by a local martingale and omitting explicit appearance of the time parameter, we obtain
because of (7), (8) and (9). Thus is a local martingale, which is also bounded on so is a bounded martingale, and the result follows.
∎
Theorem 1 extends easily to the situation where is the solution of a linear SDE.
Theorem 2.
Suppose that and are bounded measurable functions, and that solves
(12) 
Suppose that and are bounded measurable functions such that is also bounded, and suppose that and are defined as before at (4), (5).
Then is given explicitly as
(13) 
where are obtained as the unique solutions to the system of ordinary differential equations (ODEs),^{7}^{7}7We use an “overdot” to denote the derivative with respect to .
(14)  
(15)  
(16) 
subject to the boundary conditions .
Proof.
The coefficients of the SDE (12) are globally Lipschitz, so it is a standard result (see, for example, Rogers and Williams (2000) Theorem V.11.2) that the SDE has a unique strong solution. If we now set
(17) 
where and solve
(18)  
(19) 
then a few simple calculations show that
and Theorem 1 applies. The equations (14), (15) and (16) are easily checked to be the analogs of (7), (8) and (9) respectively.
∎
Remark 1.
We will want to apply Theorem 1 to situations where . This is a simple limiting case of the problem where we take and let . In a little more detail, we let denote the solution to (14)(16) with boundary condition , and we write
(20) 
for the quadratic form . Evidently is decreasing in for each and , and from this it follows easily that limits of exist for each and determine for the limit case when .
Remark 2.
Theorem 2 is a special case of the FeynmanKac formula; the fact that the process defined in (11) is a martingale is equivalent to the FeynmanKac formula, and is valid for any additive functional of the diffusion . However, without the special linear form of the SDE for and the quadratic form of the additive functional it is rare that any explicit solution can be found for .
Remark 3.
If is constant, we may assume that
, the identity matrix. To see this, let
, and note that the diffusion process solves the linear SDE,Letting and we obtain
and thus we can work with the process instead of . However, it seems simpler to provide the full form of the solution for the SDE (12) rather than a reduced form which then requires a translation back to the original problem.
Remark 4.
Although Theorem 2 deals with the general case where are measurable functions, in the remainder of this paper we only need invoke Theorem 2 for the special case in which and are constants. For this reason, we will sometimes use the alternative expanded notation
(21) 
when we want to make explicit the dependence of on the coefficients and appearing in .
3 Computing the moments of
Henceforth, we deal exclusively with cases where
Recall the definition (10) of the random matrix . Let
be the moment generating function of the joint distribution of
, which can be expressed using quadratic functionals of as(22)  
Here, is a positivedefinite symmetric matrix with entries denoted by As we shall show in the following proposition, the function is all we shall need to evaluate the moments of .
Proof.
It is well known that the moments of a random variable can be obtained by differentiating the moment generating function, given it exists (Billingsley, 2008). Now note that for any fixed nonnegative , there exists such that is positive semidefinite for any and thus . Hence, the partial derivative with respect to exists at . Applying Fubini’s Theorem we obtain
Next, recall that by the definition of Gamma function, for any ,
Since , we can apply the above formula to obtain (23) (by Tonelli’s Theorem, the order of integration can always be exchanged). ∎
So we see that the distribution of is determined by (22), from which moments can in principle be derived using Proposition 1; but we need to get hold of the expression (22). This is where Theorem 2 comes in. If is a solution of a linear SDE (12), starting at to fix the discussion, and we set
then Theorem 2 tells us how to compute
(24)  
(25) 
where we have written and to emphasize dependence on . If we now integrate over with a distribution the righthand side of (24) becomes
(26) 
The strategy now should be clear. In any particular application, we use Theorem 2 to obtain as explicitly as possible, and then we integrate (25) over to find .
4 Examples.
In this Section we will carry out the program just outlined in four examples, and obtain remarkably explicit expressions for everything we need.
In the first three examples, the twodimensional diffusion process has two special properties:

The law of is the same as the law of for any fixed rotation matrix ;

The two components of are independent.
Consequently, if we abbreviate , , and define
(27) 
it follows that the function defined at (22) simplifies to the product
(28) 
where
are the eigenvalues of
. This observation simplifies the solution of the differential equations (14)(16) considerably, reducing everything to a onedimensional problem.The final example, that of correlated Brownian motion, reduces to the Brownian example by linear transformation.
4.1 Brownian motion.
For a standard onedimensional Brownian motion , consider the function where and . By Theorem 2, the solution has the following form (the subscript “” is Brownian motion)
which leads to the following system of ordinary differential equations
Using the boundary condition , we obtain
where . Using the condition , one can show that the solution for is
Solving the third ODE, we obtain
and thus
As at (26), we now mix this expression over to discover that in this example the function (defined at (27)) takes the simple explicit form
(29) 
From (28) therefore, the moment generating function is given by
(30) 
where are the eigenvalues of . These eigenvalues are given in terms of the entries of as
(31) 
where is the th entry of .
Consider for . Note that for any , the expectation always exists since
. Further, all the odd moments, i.e.
, are zero by symmetry. To compute an even moment of , we apply formula (23). For example, consider the second moment. Straightforward but tedious calculations yieldwhere we have applied a change of variables, . Note that this is exactly the same as the formula provided in Ernst et al. (2017, Proposition 3.4).
For higherorder moments, the calculation of is extremely laborious. We use Mathematica to perform symbolic highorder differentiation and then the twodimensional numerical integration. The numerical results are summarized in Table 1. The choice of is irrelevant since the distribution of does not depend on .
2  4  6  8  

0.240523  0.109177  0.060862  0.037788  
10  12  14  16  
0.025114  0.017504  0.012641  0.009385 
We proceed to use the numerical values of to approximate the probability density function of . For example, we may consider a polynomial approximation
The coefficients can be computed by matching the first moments of (including the zero moment which is always equal to ). This is also known as the Legendre polynomial approximation to the density function (Provost, 2005). The 4thorder, 6thorder and 8thorder polynomial approximations are provided below.
The 12thorder approximation is drawn in Figure 1 above, which looks almost the same as the 8thorder one (not shown here). The above expressions constitute the first density approximations to Yule’s nonsense correlation. It can be seen that the distribution of is dispersed: the density remains approximately constant for .
We have only provided the numerical values of up to order 16. This has been done for two reasons. Firstly, for practical purposes such density approximation, moments of even higher orders are of much less interest. Secondly, the calculations of the derivative and the double integral become extremely slow and require massive memory.
4.2 OrnsteinUhlenbeck process.
Consider a onedimensional OrnsteinUhlenbeck (OU) process which starts from and evolves according to the following stochastic differential equation:
(32) 
By Theorem 2, the solution has the form
which can be obtained by solving the following system of ODEs
Using , we solve the first equation to obtain
where The second differential equation is firstorder linear, so can be solved explicitly; after some routine calculations we obtain
Finally, solving the last differential equation yields
Mixing over with a Gaussian law as before, and using , we obtain
If we have two independent OrnsteinUhlenbeck processes which both start at zero and have common mean reversion parameter , one can check that an orthogonal transformation of leaves the joint distribution invariant. Indeed, the new twodimensional process follows exactly the same SDE. Hence, the moment generating function in this case can be computed by
where are the eigenvalues of .
0.1  0.2  0.3  0.4  0.5  1  
0.23209  0.22438  0.21734  0.21091  0.20504  0.18231  
2  5  10  20  50  100  
0.15583  0.11454  0.07627  0.04404  0.01907  0.00971 
4.3 Brownian bridge.
For a more complicated example, consider a standard Brownian bridge (denoted by “”) which satisfies . In this case, we must fix and let . The dynamics of can be described by (see for example Rogers and Williams (2000) Theorem IV.40.3)
Though this SDE has the linear form, the drift coefficient explodes at . Hence, it does not satisfy the conditions required in Theorem 2. However, the singularity can easily be isolated, by freezing everything at and applying Theorem 2 to that; we can then let and we find the instances of the ODEs (14)(16) to be
Solving the first differential equation with yields
One can check that . Similarly, the solution to the second ODE is given by
Though at first sight this might appear to have a singularity at it is in fact analytic. The solution to the third differential equation is given by
One can also check that . Using this, we have
Hence
Mixing over gives the onedimensional generating function
As in the case case of OrnsteinUhlenbeck processes, the moment generating function is
2  4  6  8  

0.149001  0.047864  0.0201829  0.009876 
In Table 3 we provide the moments of for independent Brownian bridges. Comparing with Table 1, we can see that has smaller variance for two Brownian bridges. Intuitively, this is because Brownian bridges are forced to fluctuate around zero more frequently than Brownian motions: a Brownian bridge has to return to zero at but a Brownian motion is likely to make long excursions away from zero.
4.4 Correlated Brownian motion.
Let be two Brownian motions with constant correlation , represented by the following SDE
To compute the moment generating function , we take the approach outlined in Remark 3. Define a matrix as
Then the process is a twodimensional Brownian motion with independent coordinates. The inverse of is
We now transform the problem to the uncorrelated case by
where we use “” to indicate that is a correlated twodimensional Brownian motion. The solution may be expressed as
(33) 
where are the eigenvalues of the matrix . Routine calculation yields
In Table 4 we give the first and second moments of for twodimensional correlated Brownian motion with correlation coefficient . Observe that is always slightly smaller than if . The variance of , computed as , is decreasing (as
increases) but very slowly. Indeed, the standard deviation of
is for , for and for .0  0.1  0.2  0.3  0.4  

0  0.08873  0.17792  0.26804  0.35963  
0.24052  0.24550  0.26061  0.28636  0.32368  
0.2405  0.2376  0.2290  0.2145  0.1943  
0.5  0.6  0.7  0.8  0.9  
0.45338  0.55004  0.65071  0.75698  0.87151  
0.37407  0.43986  0.52477  0.63509  0.78298  
0.1685  0.1373  0.1013  0.0621  0.0235 
References
 Aldrich (1995) Aldrich, J. (1995). Correlations genuine and spurious in Pearson and Yule. Statistical Science, 10(4): 364–376.
 Billingsley (2008) Billingsley, P. (2008). Probability and Measure. John Wiley & Sons.
 Chan (1991) Chan, T. (1991). Indefinite quadratic functionals of gaussian processes and leastaction paths. Annales de l’IHP Probabilités et Statistiques, 27(2):239–271.
 Chan et al. (1994) Chan, T., Dean, D. S., Jansons, K. M., and Rogers, L. C. G. (1994). On polymer conformations in elongational flows. Communications in Mathematical Physics, 160(2):239–257.
 DonatiMartin and Yor (1997) DonatiMartin, C. and Yor, M. (1997). Some Brownian functionals and their laws. The Annals of Probability, 25(3):1011–1058.
 Dynkin (1980) Dynkin, E. B. (1980). Markov processes and random fields. Bulletin of the American Mathematical Society, 3(3):975–999.
 Ernst et al. (2017) Ernst, P. A., Shepp, L. A., and Wyner, A. J. (2017). Yule’s “nonsense correlation“ solved! The Annals of Statistics, 45(4):1789–1809.
 Fixman (1962) Fixman, M. (1962). Radius of gyration of polymer chains. The Journal of Chemical Physics, 36(2):306–310.
 Granger and Newbold (1974) Granger, C. and Newbold, D. (1974). Spurious regression in econometrics. Journal of Econometrics, 2: 111–120.
 Hendry (1986) Hendry, D. (1986). Economic modelling with cointegrated variables: an overview. Oxford Bulletin of Economics and Statistics, 48(3): 201–212.
 Lévy (1951) Lévy, P. (1951). Wiener’s random function, and other laplacian random functions. In Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. The Regents of the University of California.
 Mac aonghusa and Pule (1989) Mac aonghusa, P. and Pule, J. V. (1989). An extension of Lévy’s stochastic area formula. Stochastics: An International Journal of Probability and Stochastic Processes, 26(4):247–255.
 Phillips (1986) Phillips, P. C. B. (1986). Understanding spurious regressions in econometrics. Journal of Econometrics, 33(3): 311–340.
 Phillips (1998) Phillips, P. C. B. (1998). New tools for understanding spurious regressions. Econometrica, 66(6):1299–1325.
 Provost (2005) Provost, S. B. (2005). Momentbased density approximants. Mathematica Journal, 9(4):727–756.
 Rogers and Shi (1992) Rogers, L. C. G. and Shi, Z. (1992). Quadratic functionals of Brownian motion, optimal control, and the “Colditz” example. Stochastics: An International Journal of Probability and Stochastic Processes, 41(4):201–218.
 Rogers and Williams (2000) Rogers, L. C. G. and Williams, D. (2000). Diffusions, Markov Processes and Martingales: Volume 2, Itô Calculus. Cambridge University Press.
 Yule (1926) Yule, G. U. (1926). Why do we sometimes get nonsensecorrelations between timeseries?–a study in sampling and the nature of timeseries. Journal of the Royal Statistical Society, 89(1):1–63.
Comments
There are no comments yet.