# Inference Related to Common Breaks in a Multivariate System with Joined Segmented Trends with Applications to Global and Hemispheric Temperatures

What transpires from recent research is that temperatures and radiative forcing seem to be characterized by a linear trend with two changes in the rate of growth. The first occurs in the early 60s and indicates a very large increase in the rate of growth of both temperature and radiative forcing series. This was termed as the "onset of sustained global warming". The second is related to the more recent so-called hiatus period, which suggests that temperatures and total radiative forcing have increased less rapidly since the mid-90s compared to the larger rate of increase from 1960 to 1990. There are two issues that remain unresolved. The first is whether the breaks in the slope of the trend functions of temperatures and radiative forcing are common. This is important because common breaks coupled with the basic science of climate change would strongly suggest a causal effect from anthropogenic factors to temperatures. The second issue relates to establishing formally via a proper testing procedure that takes into account the noise in the series, whether there was indeed a `hiatus period' for temperatures since the mid 90s. This is important because such a test would counter the widely held view that the hiatus is the product of natural internal variability. Our paper provides tests related to both issues. The results show that the breaks in temperatures and radiative forcing are common and that the hiatus is characterized by a significant decrease in their rate of growth. The statistical results are of independent interest and applicable more generally.

## Authors

• 1 publication
• 2 publications
• 1 publication
• 9 publications
• ### Spatio-temporal quantile regression analysis revealing more nuanced patterns of climate change: a study of long-term daily temperature in Australia

Climate change is commonly associated with an overall increase in mean t...
03/10/2021 ∙ by Qibin Duan, et al. ∙ 0

• ### Warming trend in cold season of the Yangtze River Delta and its correlation with Siberian high

Based on the meteorological data from 1960 to 2010, we investigated the ...
06/17/2018 ∙ by Chunmeng Wei, et al. ∙ 0

• ### Nonparametric estimation and bootstrap inference on trends in atmospheric time series: an application to ethane

Understanding the development of trends and identifying trend reversals ...
03/13/2019 ∙ by Marina Friedrich, et al. ∙ 0

• ### Numerical Simulations of Frost Growth Using Mixture Model on Surfaces with Different Wettability

Frost growth on cold surfaces is a transient process with coupled heat a...
03/03/2021 ∙ by Shantanu Shahane, et al. ∙ 0

• ### Relationship between the melting Arctic Sea Ice Extent and North Atlantic Oscillation Index

Arctic Sea Ice Extent (SIE) maintains the ocean circulation at the equil...
01/28/2020 ∙ by Alka Yadav, et al. ∙ 0

• ### Spatial trend analysis of gridded temperature data at varying spatial scales

Classical assessments of trends in gridded temperature data perform inde...
01/25/2019 ∙ by Ola Haug, et al. ∙ 0

• ### Spermatogonia apoptosis induction as a possible mechanism of toxoplasma gondii induced male infertility Running Title: Apoptosis in Spermatogonia

Objectives: The protozoan Toxoplasma gondii as an intracellular protozoa...
07/31/2020 ∙ by MAJ Dr. Sorush niknamian, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Some of the most relevant questions about the attribution of climate change are concerned with understanding particular periods in which the rate of warming changed; e.g., rapid warming, slowdowns or pauses. An example is the apparent slowdown in warming since the 1990s, for which various methods were applied to document or reject its existence (e.g., Fyfe et al., 2016; Lewandowsky et al., 2015, 2016). These issues are delicate to handle when dealing with observed temperature series, as the object of interest is the secular trend behind the observed warming. This underlying trend is affected by low-frequency natural variability (Swanson et al., 2009; Wu et al., 2011) and changes in its rate of warming are difficult to detect and attribute. New tests and approaches to investigate common features in temperature and radiative forcing can make attribution studies more relevant for climate science and policy-making by providing a better understanding of the drivers behind them.

The approach we take consists in designing statistical tests in a multivariate setting with joint-segmented trends to investigate the existence of common breaks and the “hiatus”. There are two issues that until now remain unresolved, which this approach can address. The first is whether the breaks in the slope of the trend functions of temperatures and radiative forcing are common. This is important because common breaks coupled with the basic science of climate change would strongly suggest a causal effect from anthropogenic factors to temperatures. As is well known, a common linear trend can occur when the series are spuriously correlated. A common break can eliminate such concerns of spurious correlation and foster the claim that the relationship from radiative forcing to temperatures is causal. Hence, our paper first aims at developing a test for common breaks across a set of series modeled as joint segmented trends with correlated noise. The theoretical framework follows Qu and Perron (2007) and Oka and Perron (2016), building on Bai and Perron (1998, 2003). Their framework, however, precludes joint segmented trends since the regressors are functions of the break dates, which makes the problem very different. Perron and Zhu (2005) show that the limit distribution of the estimate of the break date in a single time series with a joint segmented trend follows a normal limit distribution. We build on that result to show that our common break test follows a chi-square distribution. Theoretically, our results are more general and cover a wide range of cases to test hypotheses on the break dates in a multivariate system, within or across equations. Although our model with a single equation is similar to that in Perron an Zhu (2005), our work has a different focus, namely testing, while Perron and Zhu (2005) deals with the distribution theory of the estimate. We are not aware of any test in the literature that can handle a set of general linear restrictions on break dates both across and within equations. Furthermore, our test for an additional break is more closely related to Perron and Yabu (2009), and the studies cited therein. The attempt to increase the power of structural break tests using cross equation correlation is also new in the case of joint-segmented trends. Our results show that, once we filter temperatures for the effect of the Atlantic Mutidecadal Oscillation (AMO) and the North Atlantic Oscillation (NAO) for reasons explained in the text, the breaks in the slope of radiative forcing and temperatures are common, both for the large increase in the 60s and the “hiatus” period.

The second issue relates to establishing formally via a proper testing procedure that takes into account the noise in the series, whether there was a “hiatus” period for temperatures since the mid-90s. This is important because such a test would counter the widely held view that the “hiatus” is the product of natural internal variability (Kosaka and Xie, 2013; Trenberth and Fasullo, 2013; Meehl et al., 2011; Balmaseda, Trenberth and Källén, 2013). Using standard univariate tests (e.g., Perron and Yabu, 2009), the results are mixed across various series and sometimes borderline. Our aim is to provide tests with enhanced power by casting the testing problem in a bivariate framework involving temperatures and radiative forcing. Our results indicate that indeed the “hiatus” represents a significant slowdown in the rate of increase in temperatures, especially when considering global or southern hemisphere series, for which our test points to a rejection of the null of no change for all data sources.

We consider a multivariate system with equations where the dependent variables are modeled as joint-segmented trends with multiple changes in the slope. The errors are allowed to be serially correlated and correlated across equations. Of interest is testing for general linear restrictions on the break dates including testing for common breaks across equations. The test used is a (quasi-) likelihood ratio test assuming serially uncorrelated errors, in which case it has a pivotal chi-square distribution. However, it is non-pivotal in the general case of interest. Accordingly, we also consider a corrected Wald test with a pivotal limit distribution, which can be constructed using break dates estimated one equation at a time, labelled OLS-Wald, or estimated via the complete system, labelled GLS-Wald. The latter can offer more efficient estimates when there is correlation in the errors across equations. It is, however, computationally demanding as it requires least squares operations of order where is the sample size and is the total number of breaks. The OLS-Wald test requires only least-squares operations of order where () is the number of breaks in the equation, making it much easier to compute in a multivariate system. Our simulation study shows that the two Wald tests have finite sample sizes close to the nominal size when the extent of the serial correlation in the noise is small. They, however, suffer from potentially severe liberal size distortions for moderate to strong serial correlation. Hence, for all three tests (since the LR is non-pivotal), we suggest a bootstrap procedure to obtain the relevant critical values. These bootstrap tests are shown to have exact sizes close to the nominal 5% level in all cases considered.

We also propose a test for the presence of an additional break in some series. For instance, in our applications, the null and alternative hypotheses specify a break in the slope of the radiative forcing series (for which a rejection is easy using a single equation test), while under the null hypothesis there is no break in the temperature series but there is one under the alternative. Hence, the aim is to see whether a break in temperatures is present using a bivariate system. The theoretical framework is more general and allows multiple breaks in a general multivariate system. The test considered is a quasi-likelihood ratio test, whose limit distribution is non-pivotal and depends in a complex way on a number of nuisance parameters. Hence, we resort to a bootstrap procedure to obtain the relevant critical values.

The paper is structured as follows. Section 2 presents the statistical model and the assumptions imposed. Section 3 considers testing for linear restrictions on the break dates including as a special case testing for common breaks across equations. The various tests and their limit distributions are discussed. Section 4 presents the test for the presence of an additional break in some series within a multivariate system. Section 5 presents simulation results about the size and power of the tests and introduces the bootstrap versions recommended. Section 6 presents the applications. The data are discussed in Section 6.1. Section 6.2 presents the results for the common breaks tests, while Section 6.3 addresses the issue of testing for the “hiatus” using a bivariate system with a temperature and a radiative forcing series. Brief conclusions are offered in Section 7. All proofs are in an appendix.

## 2 Statistical Model and Assumptions

We consider the following model with multiple breaks in a system of variables. Each variable consists of a linear trend with multiple changes in slope with the trend function is joined at each break date. More specifically, there are breaks in the slope of the variable, and the -variate system is:

 yit=μi+βit+∑mij=1δijbt(k0ij)+uit, (1)

for and , where , is the indicator function of the event and  is the break date for the change (with magnitudes ) in the trend of the

variable. The vector of break dates for the

variable is and is the vector of break dates for the entire system, . We let denote the total number of breaks in the system. Hence, is an vector. For the variable, we have in matrix form

 yi=X(k0i)θi+ui, (2)

where and with , and . The entire system in matrix notation is

 y=X0θ+u, (3)

where , , and . We are interested in the case where the number of breaks for each variable is known but the break dates are unknown. For a vector of generic break dates, we will use the notation without a superscript, that is, and . Also, we will simply write to denote the regressor matrix corresponding to break dates specified by , that is, .

To motivate our quasi-likelihood ratio test, we first assume that is multivariate normal with 0 mean and covariance . This assumption will be relaxed when we derive the asymptotic distribution of our test. For a generic break date vector and the corresponding regressor matrix , the log-likelihood function of is then given by

 l(k,θ,Σ)=−(nT/2)log2π−(T/2)log|Σ|−(1/2)(y−Xθ)′(Σ−1⊗IT)(y−Xθ).

Let and be the maximum likelihood estimators for and , which jointly solve and , where . Thus, the maximized the log-likelihood function is

 l(k)=−(nT/2)(log2π+1)−(T/2)log|^Σ(k)|. (4)

It is useful to work with break fractions; hence define , and , with the true break fractions , and equivalently defined. With these definitions, functions of such as and will be used interchangeably with and

. We now state the assumptions needed for the asymptotic analysis.

with for .

for and .

###### Assumption 3

Let . Then, is stationary with and . In addition, and , where “” denotes weak convergence under the Skorohod topology, is the dimensional standard Wiener process and .

Assumptions 1 and 2 are standard and simply state that the break dates are asymptotically distinct (i.e., each regime increases proportionally with the sample size ) and the changes in the parameters are non-zero at the break dates. Assumption 3 states that

is a stationary process and its partial sum follows a functional central limit theorem.

## 3 Testing for Linear Restrictions on the Break Dates

We now consider the following null and alternative hypotheses:

 H0:Rλ0=r    and    H1:Rλ0≠r. (5)

Some examples of this type of hypotheses are the following: 1) specific break fractions: and ; 2) break fractions with a fixed distance: and ; 3) common breaks: and . For the first case, historic events can be used to determine the value of

. The second case can be used across equations as well as within an equation. The third case can be viewed as a special case of the second one and should be used across equations only. The LR test statistic is

 LR=−2[maxλ:Rλ=rl(λ)−maxλl(λ)]=T[minλ:Rλ=rlog|^Σ(λ)|−minλlog|^Σ(λ)|].
###### Theorem 1

Let Assumptions 1-3 hold with . Then, as , we have under in (5) that , where .

Our model with broken segmented trends at unknown dates is a rare case in structural change problems with the estimates of the break dates having an asymptotic normal distribution; see Perron and Zhu (2005). Since the likelihood ratio test essentially involves quadratic forms of these estimates, it has the usual

distribution. When the short-run and long-run variances differ

, the asymptotic distribution depends on nuisance parameters. It is difficult to directly modify the LR test. Instead, we develop a Wald test for the hypotheses (5) robust to serial correlation in the errors. To express the limiting distribution, define and , with being the positive part of the argument. Also, , , and , with all integrals from 0 to 1. Denote by the element of and by that of . Now, define

We also define , , , , and equivalently, as well as

 Ξ0 = Dδ[ΓGG−ΓGFQ−1FFQFG−QGFQ−1FFΓFG+QGFQ−1FFΓFFQ−1FFQFG]Dδ Ξ1 = Dδ[QGG−QGFQ−1FFQFG]Dδ.

where with and . We now provide the limiting distribution of the estimate of the break fraction that maximizes the log-likelihood (4), which allows us to obtain the limit distribution of the Wald test.

###### Theorem 2

Let Assumptions 1-3 hold, and . Then, as , (i) ; (ii) under in (5), the Wald test statistic is , where and .

The total number of regressions required to obtain the above break date estimator is . This can pose a considerable amount of computational burden especially if the procedure is to be bootstrapped. Hence, it is worthwhile to devise a test assuming diagonality in because the break dates are then estimated separately from each variable and the total number of regressions required is only . While this modification lessens computational cost significantly, the resulting break date estimators can be less efficient due to the neglected correlation between variables in the system. Suppose the break dates are estimated separately for each variable from (2). Thus, and

 ˜λi=argminλiSSRi(λi), (6)

where , with . The limiting distribution of is obtained as a special case of Theorem 2 by letting . The limiting covariance can be simplified somewhat. Define

 Ξs=⎡⎢ ⎢ ⎢ ⎢⎣ψ11D−1δ1P11D−1δ1…ψ1nD−1δ1P1nD−1δn⋮⋱⋮ψn1D−1δnPn1D−1δ1…ψnnD−1δnPnnD−1δn⎤⎥ ⎥ ⎥ ⎥⎦,

where

 Pij=(∫pi(r)p′i(r)dr)−1∫pi(r)p′j(r)dr(∫pj(r)p′j(r)dr)−1,

with and is the element of defined in Assumption 3.

###### Corollary 1

Suppose that Assumptions 13 hold. Let where each is obtained from (6). Then, as : (i) ; (ii) under in (5), where and .

The result in (i) coincides with one result reported in Perron and Zhu (2005) when there is only one equation with one break.

## 4 Break Detection

We consider a test for the null of breaks in the system against the alternative of breaks with the location of the additional break unspecified. We again consider a quasi-likelihood framework. Suppose that the model is now given by

 y=X0θ+ai(h)γ+u,

where for some and is an vector with in the element and elsewhere. The null and alternative hypotheses are now:

 H0:γ=0    and    H1:γ≠0. (7)

The maximum of the likelihood function under is again given by (4) for a generic break date vector . Similarly, the maximum of the likelihood function under is:

 l(i)(k,h)=−(nT/2)(log2π+1)−(T/2)log|^Σ(i)(k,h)|,

for a generic break date vector and an additional break date . The notation  and is used to indicate the fact that an additional break is inserted in the equation. This additional break can only occur in one of the equations and we assume for simplicity that the relevant equation is known. This assumption is relaxed below. Just like we have defined and used and interchangeably, we define and will use and interchangeably. The test statistic we consider is given by

 LR=−2[l(^λ)−supν∈C(i)Tl(i)(^λ,ν)],

where and

 C(i)T={εps≤ν≤1−εps and |ν−^λij|≥εps, j=1,…,mi for some εps, 0<εps

As is well known in the structural break literature, is not identified under the null hypothesis and must be restricted to ensure a non-divergent limiting distribution of the test statistic. The set imposes the relevant restrictions.111Instead of using under both and , it is possible to jointly minimize under . However, this requires a more complex asymptotic analysis and we will focus on the simpler case. When the equation with an additional break is not specified a priori, the statistic can be extended to be

 supLR=−2[l(^λ)−max1≤i≤nsupν∈C(i)Tl(i)(^λ,ν)].

In order to express the limiting distribution, we need to define additional terms. Let and be as defined before. Let , , and where all integrals are taken from 0 to 1. Recall that is the element of and is that of , and let , , , , as well as and . Now, using these functions, we define

 ξ(i)0(ν) = Γ(i)BB(ν)−Q(i)BF(ν)Q−1FFΓ(i)FB(ν)−Γ(i)BF(ν)Q−1FFQ(i)FB(ν)+Q(i)BF(ν)Q−1FFΓFFQ−1FFQ(i)FB(ν) ξ(i)1(ν) = Q(i)BB(ν)−Q(i)BF(ν)Q−1FFQ(i)FB(ν) ς(i)0(ν) = Dδ[Γ(i)GB(ν)−QGFQ−1FFΓ(i)FB(ν)−ΓGFQ−1FFQ(i)FB(ν)+QGFQ−1FFΓFFQ−1FFQ(i)FB(ν)] ς(i)1(ν) = Dδ[Q(i)GB(ν)−QGFQ−1FFQ(i)FB(ν)],

where , , , and are as previously defined. Finally, let be a mean zero Gaussian process defined over the unit interval with

 Var(η(i)(ν))=ξ(i)0(ν)−ς(i)1(ν)′Ξ−11ς(i)0(ν)−ς(i)0(ν)′Ξ−11ς(i)1(ν)+ς(i)1(ν)′Ξ−11Ξ0Ξ−11ς(i)1(ν).

Also, denote the limit counterpart of the set by , replacing by . The limit distribution of the tests are stated in the following theorem.

###### Theorem 3

Suppose that Assumptions 13 hold. Then, as , we have under in (7): (i) ; (ii) .

The limiting distributions depend on various nuisance parameters. First, all of the true break fractions for and matter via terms such as , , , , , , and . Second, both the short-run and long-run variances matter via the various and matrices. Third, the trimming parameter in the set matters. Given the complexity of the limiting distributions, we neither seek a way to eliminate nuisance parameters nor attempt to tabulate critical values. Instead, we resort to using a bootstrap method to generate p-values. The various terms are evaluated in a manner similar to that of the tests for common breaks described in the next section. Note that if the short-run and long-run variances coincide and the true break fractions are used instead of their estimates, reduces to be .

## 5 Monte Carlo Simulations

In this section we provide Monte Carlo simulation results to assess the adequacy of the asymptotic distributions derived in Sections 3 and 4. All results are based on 1,000 replications. We focus on the test for common breaks as the tests for an additional change in slope are too computationally demanding so that a reasonable simulation experiment is prohibitive. The data generating process (DGP) is specified by:

 y1t = μ1+β1t+δ11bt(k011)+u1t y2t = μ2+β2t+δ21bt(k021)+u2t.

The results are exactly invariant to the values of and . For the slope change parameters and , the cases of 0.5, 1.0 and 1.5 are considered. The error terms are such that

 (u1tu2t)=L(e1te2t) \ with \ LL′=(1ρρ1),

and , with . The parameter stands for the correlation across equations and stands for the autoregressive parameter in each of the error terms. We consider (-0.5, 0, 0.5) for the values of and (0, 0.3, 0.7) for the values of . What influences the precision of the break date estimate is the ratio between the break magnitude and the long-run variance. In our simulations, the long-run variance is always set to be unity. The sample size is set to and the true break dates are at mid-sample, . For each generated data, we test four null hypotheses: (1) and , (2) and , (3) and and (4) and

. Hence the null rejection probabilities for (1) and (3) pertain to the finite sample sizes (with a nominal size 5%) while those for (2) and (4) pertain to powers. In testing these hypotheses, we consider the LR and Wald tests based on the SUR system (Theorems

1 and 2) as well as the Wald test based on estimating the break dates equation by equation using OLS regressions (Corollary 1). We refer to the two Wald tests as the GLS-Wald and OLS-Wald, respectively. For both versions, the long-run variance is estimated using a quadratic spectral window, where the bandwidth parameter is selected using Andrews’ (1991) data dependent method with an AR(1) approximation.

We also simulate the bootstrap version of the aforementioned tests. To obtain bootstrap p-values, we first estimate break dates imposing the null hypothesis and remove the estimated trend. Then, we fit a VAR(1) on the resulting residuals. Following Kilian (1998), we compute the bias-corrected estimates of the VAR coefficients and obtain the corresponding residuals of the VAR. From them, we construct a pseudo VAR process and add to it the estimated trend functions to obtain a bootstrap sample. Simulating bootstrap tests requires a lot of computing time especially for those based on the full system. We make use of the Warp-Speed method suggested by Giacomini, Politis and White (2013). Also, we use the feasible GLS estimate instead of the MLE in the construction of the LR and GLS-Wald tests.

In Table 1, the results for the LR test are reported. Regarding the finite sample size obtained using the asymptotic critical values, two features emerge. First, the size is near or below the nominal level when , but it climbs up quickly as increases. In the worst case, it can go up to 50%. This result is well expected since the asymptotic validity of the LR test holds only when . Second, the size inflation is more evident when is small. In fact, when is very large, the estimated break dates coincide with the true ones making the test statistic literally zero with a large probability. However, this type of conservativeness is of little concern because the power function does not appear to be decreasing as gets large. On the other hand, the finite sample size obtained by the bootstrap procedure stays near the nominal 5% level across all simulation designs, offering significant improvements over the asymptotic test. The bootstrapped test gives slightly smaller power than the asymptotic test, but the difference is marginal especially in view of the large improvement in size.

Table 2 reports the results for the GLS-Wald test. Overall, the GLS-Wald test using the asymptotic critical values are more liberal than the corresponding LR test, which is a well known characteristic of Wald tests. Despite being liberal, the GLS-Wald is less sensitive to the value of , since we are using a robust covariance matrix. The GLS-Wald test also gets conservative as gets large for the same reason as the LR test. The bootstrapped GLS-Wald test controls the size as well as the bootstrapped LR test. Table 3 shows the results for the OLS-Wald test. It performs similarly to the GLS-Wald test for both the asymptotic and bootstrap versions. Lastly, the power is close to one in almost all cases. The hypotheses that are rejected misspecifies the break dates by only 5% of the sample size, yet our tests are powerful enough to detect them. To sum up, all three tests, the LR, GLS-Wald, and OLS-Wald suffer from size distortions if the breaks are not large enough. However, the bootstrap procedures significantly help control the size without losing much power.

## 6 Applications

We investigate the commonality of the break dates across the temperature and anthropogenic total radiative forcing series as well as test whether the recent “hiatus” is significant. We use two sets of global, northern and southern hemispheric temperature series, each of which will be denoted as G, N and S, respectively. The first set comes from the Climate Research Unit’s HadCRUT4 (Morice et al., 2012) and the second from the NASA database (GISTEMP Team, 2015; Hansen et al., 2010). For global temperatures, we also use the data from Berkeley Earth (Rohde et al., 2013) and the dataset in Karl et al. (2015). We first discuss the data used, then the results for the common breaks tests and finally the results for the test on a change in slope in temperatures related to the so-called “hiatus” period.

### 6.2 Common breaks tests

The temperature series are affected by various modes of natural variability such as the Atlantic Multidecadal Oscillation (AMO) and the North Atlantic Oscillation (NAO), which are characterized by low frequency movements (Kerr, 2000; Hurrell, 1995). Since trends and breaks are low frequency features, it is important to purge them from the temperature series allowing more precise estimates of the break dates.222We also considered filtering the effect of the Pacific Decadal Oscilliation (PDO) index (https://www.ncdc.noaa.gov/teleconnections/pdo/) and the Southern Oscilliation Index (SOI) (http://www.cru.uea.ac.uk/cru/data/soi/); see Wolter and Timlin (1998) and Mantua and Hare (2002). The results were qualitatively similar showing robustness. Hence, we  do not report them. Other high frequency fluctuations in temperature series do not affect the precision of the estimates of the break dates and the magnitudes of the changes in slope. Accordingly, we filter out the effect of these modes of variability by regressing each temperature series on these modes and a constant. Since the effect of natural variability might have occurred with a time lag, we choose an appropriate lag using the Bayesian Information Criterion (BIC); Schwarz (1978). The candidate regressors for the filtering are the current value and lags (up to order ) of AMO and NAO. We first work with G from HadCRUT4. We start with , so that the candidates are the current value and the first lag only. BIC chooses the current value of AMO and the first lag of NAO. Since the maximum lag allowed is selected, we increase to 4. Then, BIC chooses the current value of AMO and the second lag of NAO (the first is not included given that we search over models not necessarily including all lags up to some chosen order). When applying the BIC, the number of observations used is limited by (e.g., Perron and Ng, 2005). Having decided on the current value of AMO and the second lag of NAO, we apply the filtering to all available observations, not limited by . We could repeat the same procedure to each series. However, it does not make much sense to have the same mode affect each temperature series with a different lag. Hence, we filter all series with the current value of AMO and the second lag of NAO.333If we choose the filtering regressors for each series, G and SH from NASA and G from Berkeley require the fourth lag of NAO instead of the second. But the break date estimates change by two years at most. The filtered temperature series are denoted as G̃, Ñ and S̃. Figure 1 presents graphs of the original and filtered series.

For the radiative forcing variables, we use WMGHG and TRF. Figure 2 displays the time plots of WMGHG and TRF as well as the AMO and NAO. The data set used covers the period 1880-2014 and the filtering is done with the full sample. We use the data for 1900-2014 for the purpose of the common break date tests.444

For the radiative forcing variables, the data is available only up to 2011. We use forecast values for 2012-2014 based on an autoregressive model with a broken linear trend. In the case of TRF the forecast is produced using an AR(2) and a broken linear trend with 1960 and 1991 as the break dates. For WMGHG, we use an AR(8) and and a broken linear trend with break dates at 1960 and 1994.

We use two sample periods: 1900-1992 and 1963-2014. The reason we split the sample is to reduce the number of breaks in the system, thereby lessening the computational burden. Our choice of sample periods is made with the intention to minimize any unwanted effect from a second break that might exist in any of the variables in the system. The break dates in WMGHG are estimated most precisely since it has a very small noise component (see Figure 2) and they are 1963 and 1992. First we consider bivariate systems with one temperature series and one radiative forcing series. The null hypothesis of a common break date is tested using the LR test from Theorem 1. Since this test is not asymptotically pivotal, we supply bootstrap p-values as well. The bootstrap is carried out in the same way as done in the Monte Carlo simulations reported earlier. We also report the GLS-Wald test from Theorem 2. Although the GLS-Wald test is asymptotically pivotal, we still supply the bootstrap p-values because of the size distortions observed in our Monte Carlo study. In addition to the 4 global, 2 northern hemispheric and 2 southern hemispheric temperature series, we also consider the average of the global, northern and southern hemispheric temperature series across different datasets. Hence, there are 11 temperature series and 11 filtered temperature series, each of which is paired with either WMGHG or TRF, yielding a total of 44 bivariate systems.

Table 4 (a) reports the results for the sample period of 1900-1992 when the original temperature series are considered; the asymptotic p-values of the Wald and LR test are very small in most cases and the null of a common break date is rejected with only a few exceptions. However, the bootstrap p-values are well over the conventional thresholds 0.05 or 0.01 in all cases. Given the previous simulation results, we view the bootstrap p-values as more reliable than the asymptotic ones. Note that the break date for the radiative forcing variable is almost unchanged at either 1963 or 1966, regardless of the temperature series in the bivariate system. Also, the estimated common break date, i.e., the estimate imposing the null of common breaks, always coincide with the break date for the radiative forcing variable. On the other hand, the break date estimates for the temperature series are quite spread out. Hence, the break dates in the radiative forcing variables are being estimated more precisely than those in the temperature series. The large p-values produced by the bootstrapping is a consequence of the high level of noise in the temperature series relative to the magnitude of the break. Accordingly, we fail to reject the null of a common break date despite the large discrepancy in the estimated break dates. Table 4 (a) also shows the results obtained with the filtered temperature series. The picture changes quite dramatically. The break dates for the radiative forcing variables are still at 1963 or 1966, but those for the temperature series are now also around 1960. Especially, when TRF enters the bivariate system, the estimate of the break date is 1960, 1965 or 1966, the exception being the southern hemispheric temperature from HadCRUT4 and the average of the two southern hemispheric temperature series, for which the estimate is still 1956. As a result, both the asymptotic and bootstrap p-values are much larger than those for the original temperature series.

Tables 4 (b) presents the results for the second sample 1963-2014 when the original temperature series are used and the estimates of the break dates for the temperature series are again quite spread out, while those for the radiative forcing variables are consistently around 1990-1992. Just like in Table 4 (a), the asymptotic p-values suggest strong rejection of the null hypothesis, but the bootstrap p-values turns out much larger reflecting again the high level of correlated noise in the original series. Note that the estimates of the break dates for the filtered temperature series are 1990-1992 with only a few exceptions. Also, the p-values are larger than the corresponding ones for the original temperature series.

We repeat the analysis for the bivariate systems with the OLS-Wald test as a robustness check and report the results in Table 5. The results are very similar when the filtered series are considered. The break date estimates obtained equation by equation coincide with those obtained from the system or differ only by a few years at most. When the original series are used, the break date estimates for some series largely deviate from those computed by the system method. However, the fact that the bootstrap p-values are much larger than usual thresholds still applies. Thus, our conclusions remain the same.

An advantage of the OLS-Wald test is that it can handle a larger system without extra computational burden. The break date estimates will not change if we consider a larger system, because they are computed equation by equation. However, the long-run variance estimate and the bootstrap p-values can change as we consider a bigger system. We consider three five-variables systems. Each system has a global, northern and southern hemispheric temperature and the two radiative forcing variables. The first one uses the temperature series from HadCRUT4, the second one from NASA and the third one has average global, northern and southern hemispheric temperature series. The results are reported in Table 6. They show results broadly similar to those obtained bivariate systems.

To provide a final robustness check, we consider testing jointly whether the two breaks near 1960 and for the “hiatus” are common. We use the OLS-Wald test, which estimates the two breaks jointly for each equation, and the period 1926-2014 to focus on the time span whereby anthropogenic factors are most likely to be the main driver affecting changes in temperatures. The results are presented in Table 7. The results obtained are broadly similar and lead to the same conclusions. When using the filtered series and the bootstrap p-values, the test fails to reject the null hypothesis of two common breaks. The first is located in the early 60’s (the onset of sustained global warming) and the second in the early 90’s (the “hiatus”) consistent with methods dealing with one break at a time presented earlier.

### 6.3 Testing for the “hiatus”

The literature on the existence and drivers of the current slowdown in the rate of warming has expanded quickly (Tollefson, 2014, 2016). In this section, we focus on testing if the “hiatus” can be explained by natural variability or if it is a feature of the underlying warming trend (Estrada, Perron and Martinez, 2013). A small part of the literature investigating the slowdown in the warming in global temperatures are based on formal structural change/change-point tests (Cahill et al., 2015; Pretis et al., 2015), while others are simply based on testing for trends during arbitrarily selected periods of time (Cahill et al., 2015; Foster and Rahmstorf, 2011; Lewandowsky et al., 2015). This type of approach, as well as most of physical-based studies, have typically failed to find convincing evidence for the existence of a change in the slope of global temperatures during the last decades. According to one of the commonly accepted hypothesis, the apparent “hiatus” is produced by the effects of low-frequency natural variability that result from coupled ocean-atmosphere processes and heat exchange between the ocean and the atmosphere. It has been proposed that effects of natural variability modes such as AMO, NAO and PDO were able to mask the warming trend since the 1990s, creating the illusion of a slowdown in the underlying warming trend (e.g., Guan et al ., 2015; Steinman et al., 2015; Li, Sun and Jin, 2013; Trenberth and Fasullo, 2013). However, Estrada and Perron (2016) argue that low-frequency oscillations do distort the underlying warming trend but cannot account for the current slowdown. Instead, the main effect of these oscillations has been to make it more difficult to detect the drop in the rate of warming, a real feature of the warming trend imparted by the slowdown in the radiative forcing from well-mixed greenhouse gases.

Establishing whether the “hiatus” period is statistically significant is important because it would counter the widely held view that it is the product of natural internal variability (Kosaka and Xie, 2013; Trenberth and Fasullo, 2013; Meehl et al., 2011; Balmaseda, Trenberth and Källén, 2013). Using standard tests (e.g., Perron and Yabu, 2009), the results are mixed across various series and sometimes borderline. Our aim is to provide tests with enhanced power by casting the testing problem in a bivariate framework involving temperatures and radiative forcing. We consider bivariate systems with one temperature series and one forcing variable. We use the LR test for the presence of a break in temperature series given the presence of a structural break in radiative forcing series. We first estimate a break date in the radiative forcing series under the likelihood framework for a bivariate system of a pair of radiative forcing and temperature series. Then, given the estimated break date in the radiative forcing series, we apply the LR test for the null hypothesis of no break in temperatures versus one break. To obtain -values, we apply the bootstrap procedure described in Section 5 with 1,000 samples. The results are presented in Table 8. First, as expected from previous results, the LR test is more likely to reject the null hypothesis of no break when using filtered temperature series. Hence, we report results only for that case. For the filtered series from 1900 to 1992, we reject the null at less than 5% significance level in all pairs of radiative forcing and temperature series. Hence, there is clear evidence of a break in temperatures that is near 1960 (varying between 1954 and 1966 depending on the series and the forcing). This concurs with the common break results obtained in the previous section. When using the sample 1963-2014 we reject the null in seven and eight cases out of eleven filtered series at the 10% significance level with well-mixed green-house gases and total radiating forcing, respectively. Since the sample size is small (, we might expect that the LR test may have little power, but our result suggests strong evidence for the presence of a break in temperature series. Note that the evidence for a break is stronger when using bivariate systems involving TRF. This is due to the fact that TRF exhibits a larger decrease in slope compared to WMGHG. The errors are also more strongly correlated. For instance, solar irradiance is a part of both TRF and temperatures and an important source of variations. When considering systems with TRF, the only pairs that do not allow a rejection at the 10% level are those associated with Northern hemisphere temperatures. For these, the p-values range from .14 to .19, while for all other pairs they are below 10%. Given that the small sample size is small, the overall evidence strongly indicates a break in temperatures in the early 90s consistent with the presence of the much-debated “hiatus”.

## 7 Conclusion

We consider a multivariate system with equations with the dependent variables modeled as joint-segmented trends with multiple changes in slope. The errors can be serially correlated and correlated across equations. We consider testing for general linear restrictions on the break dates, including testing for common breaks across equations. The test used is a (quasi-) likelihood ratio test assuming serially uncorrelated errors. Under the stated conditions, the LR test has a pivotal chi-square distribution, though it is non-pivotal in general cases of interest. Hence, we also consider a corrected Wald test having a pivotal limit distribution, which can be constructed using break dates estimated one equation at a time or via the complete system. The limit distribution of the test is standard chi-square. However, simulations show that the two Wald tests suffer from potentially severe liberal size distortions for moderate to strong serial correlation. Hence, for all three tests, we suggest a bootstrap procedure to obtain the relevant critical values. These bootstrap tests are shown to have correct size in all cases considered. Our empirical results show that, once we filter the temperature data for the effect of the Atlantic Mutidecadal Oscillation (AMO) and the North Atlantic Oscillation (NAO), the breaks in the slope of radiative forcing and temperatures are common, both for the large increase in the 60s and the recent “hiatus”.

We also consider a test for the presence of an additional break in some series. The theoretical framework is general and allows multiple breaks in a general multivariate system. The test considered is a quasi-likelihood ratio test. The limit distribution is shown to be non-pivotal and depends in a complex way on a number of nuisance parameters. Hence, we use a bootstrap procedure to obtain relevant critical values. In our applications, the null and alternative hypotheses specify a break in the slope of the radiative forcing series, while under the null hypothesis there is no break in the temperature series but there is one under the alternative hypothesis. Our results indicate that indeed the “hiatus” represents a significant slowdown in the rate of increase in temperatures, especially when considering global or southern hemisphere series, for which our test points to a rejection of the null of no change for all data sources considered. The statistical results are of independent interest and applicable more generally beyond the climate change applications considered.

## References

• 1 Andrews, D.W.K., 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817-858.
• 2 Bai, J., Perron, P., 1998. Estimating and testing linear models with multiple strucutral changes. Econometrica 66, 47-78.
• 3 Bai, J., Perron, P., 2003. Computation and analysis of multiple structural change models. Journal of Applied Econometrics 18, 1-22.
• 4 Balmaseda, M.A., Trenberth, K.E., Källén, E., 2013. Distinctive climate signals in reanalysis of global ocean heat content. Geophysical Research Letters 40, 1754-1759.
• 5 Bierens, H.J., 2000. Nonparametric Nonlinear Cotrending Analysis, with an application to interest and inflation in the United States. Journal of Business and Economic Statistics 18, 323–337.
• 6 Cahill, N., Rahmstorf, S., Parnell, A.C., 2015. Change points of global temperature. Environmental Research Letters 10(8), 084002.
• 7 Dickey. D.A., Fuller, W.A., 1979. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74, 427–431.
• 8 Engle, R.F., Granger, C.W.J., 1987. Co-integration and error correction: representation, estimation and testing. Econometrica 55, 251-276.
• 9 Estrada, F., Perron, P., 2014. Detection and attribution of climate change through econometric methods. Boletín de la Sociedad Matemática Mexicana 20, 107-136.
• 10 Estrada, F., Perron, P., 2016. Extracting and analyzing the warming trend in global and hemispheric temperatures. Unpublished Manuscript, Department of Economics, Boston University.
• 11 Estrada, F., Perron, P., Gay-García, C., Martínez-López, B., 2013. A time-series analysis of the 20th century climate simulations produced for the IPCC’s Fourth Assessment Report. Plos One 8(3): e60017.
• 12 Estrada F., Perron P., Martínez-López B., 2013. Statistically derived contributions of diverse human influences to twentieth-century temperature changes. Nature Geoscience 6, 1050-1055.
• 13 Foster, G., Rahmstorf, S., 2011. Global temperature evolution 1979–2010. Environmental Research Letters 6(4), 044022.
• 14 Fyfe, J.C., Meehl, G.A., England, M.H., Mann, M.E., Santer, B.D., Flato, G.M., Hawkins, E., Gillett, N.P., Xie, S.P., Kosaka, Y., Swart, N.C., 2016. Making sense of the early-2000s warming slowdown. Nature Climate Change 6, 224–228.
• 15 Gallagher, C., Lund, R., Robbins, M., Gallagher, C., Lund, R., Robbins, M., 2013. Changepoint Detection in Climate Time Series with Long-Term Trends. Journal of Climate 26, 4994–5006.
• 16 Gay-Garcia, C., Estrada, F., Sanchez, A. 2009. Global and hemispheric temperature revisited. Climatic Change 94, 333-349.
• 17 Giacomini, R., Politis, D.N., White, H., 2013. A warp-speed method for conducting Monte Carlo experiments involving bootstrap estimators. Econometric Theory 29, 567-589.
• 18 GISTEMP Team, 2015. GISS surface temperature analysis (GISTEMP). NASA Goddard Institute for Space Studies. Dataset accessed at http://data.giss.nasa.gov/gistemp/
• 19 Guan, X., Huang, J., Guo, R., Pu, L., 2015. The role of dynamically induced variability in the recent warming trend slowdown over the Northern Hemisphere. Scientific Reports 5, 12669.
• 20 Hansen, J., Ruedy, R., Sato, M., Lo, K., 2010. Global surface temperature change. Reviews of Geophysics 48.4.
• 21 Hansen, J., Sato, M., Kharecha, P., von Schuckmann, K., 2011. Earths energy imbalance and implications. Atmospheric Chemistry and Physics 11, 13421-13449.
• 22 Harvey, D.I., Mills, T.C., 2002. Unit roots and double smooth transitions. Journal of Applied Statistics 29, 675–683.
• 23 Hurrell, J.W., 1995. Decadal trends in the North Atlantic Oscillation and relationships to regional temperature and precipitation. Science 269, 676-679.
• 24 Johansen, S., 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59, 1551-1580.
• 25 Karl, T.R., Knight, R.W., Baker, B., 2000. The record breaking global temperatures of 1997 and 1998: Evidence for an increase in the rate of global warming? Geophysical Research Letters 27, 719–722.
• 26 Karl, T.R., Arguez, A., Huang, B., Lawrimore, J.H., McMahon, J.R., Menne, M.J., Peterson, T.C., Vose, R.S., Zhang, H.M., 2015 Possible artifacts of data biases in the recent global surface warming hiatus. Science 348, 1469–1472.
• 27 Kaufmann, R.K., Kauppi, H., Stock, J.H., 2006. Emissions, concentrations, & temperature: A time series analysis. Climatic Change 77, 249-278.
• 28 Kerr, R.A., 2000. A North Atlantic climate pacemaker for the centuries. Science 288, 1984-1985.
• 29

Kilian, L., 1998. Small-sample confidence intervals for impulse response functions. Review of Economics and Statistics 80, 218-230.

• 30 Kim, D., Perron, P., 2009. Unit root tests allowing for a break in the trend function at an unknown time under both the null and alternative hypotheses. Journal of Econometrics 148, 1-13.
• 31 Kosaka. Y., Xie, S.P., 2013. Recent global-warming hiatus tied to equatorial Pacific surface cooling. Nature 501, 403–407.
• 32 Lewandowsky, S., Risbey, J.S., Oreskes, N., 2015. On the definition and identifiability of the alleged “hiatus” in global warming. Scientific Reports 5:16784.
• 33 Lewandowsky, S., Risbey, J.S., Oreskes, N., 2016. The pause in global warming: Turning a routine fluctuation into a problem for science. Bulletin of the American Meteorological Society 97, 723-733.
• 34 Li, J., Sun, C., Jin, F.F., 2013. NAO implicated as a predictor of Northern Hemisphere mean temperature multidecadal variability. Geophysical Research Letters 40, 5497-5502.
• 35 Mantua, N.J., Hare, S.R., 2002. The Pacific Decadal Oscillation. Journal of Oceanography 58.1, 35-44.
• 36 Meehl, G.A., Arblaster, J.M., Fasullo, J.T., Hu, A., Trenberth, K.E., 2011. Model-based evidence of deep-ocean heat uptake during surface-temperature hiatus periods. Nature Climate Change 1, 360–364.
• 37 Mills, T.C., 2013. Breaks and unit roots in global and hemispheric temperatures: An updated analysis. Climatic Change 118, 745–755.
• 38 Morice, C.P., Kennedy, J.J., Rayner, N.A., Jones, P.D., 2012. Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 dataset. Journal of Geophysical Research 117, D8.
• 39 Ng, S., Perron, P., 2001. Lag length selection and the construction of unit root tests with good size and power. Econometrica 69, 1519-1554.
• 40 Oka, T., Perron, P., 2016. Testing for common breaks in a multiple equation system. Unpublished Manuscript, Department of Economics, Boston University.
• 41 Perron, P., 1989. The great crash, the oil price shock, and the unit root hypothesis. Econometrica 57, 1361-1401.
• 42 Perron, P., 1997. Further evidence on breaking trend functions in macroeconomic variables. Journal of Econometrics 80, 355-385.
• 43 Perron, P., Ng, S., 2005. A note on the selection of time series models. Oxford Bulletin of Economics and Statistics 67, 115-134.
• 44 Perron, P., Zhu, X., 2005. Structural breaks with deterministic and stochastic trends. Journal of Econometrics 129, 65-119.
• 45 Perron P, Yabu T., 2009. Testing for shifts in trend with an integrated or stationary noise component. Journal of Business & Economic Statistics 27, 369-396.
• 46 Pretis, F., Mann, M.L., Kaufmann, R.K., 2015. Testing competing models of the temperature hiatus: assessing the effects of conditioning variables and temporal uncertainties through sample-wide break detection. Climatic Change 131, 705–718.
• 47 Qu, Z., Perron, P., 2007. Estimating and testing structural changes in multivariate regressions. Econometrica 75, 459-502.
• 48 Reeves, J., Chen, J., Wang, X.L., Lund, R., Lu, Q.Q., Reeves, J., Chen, J., Wang, X.L., Lund, R., Lu, Q.Q., 2007. A Review and Comparison of Changepoint Detection Techniques for Climate Data. Journal of Applied Meteorology and Climatology 46, 900–915.
• 49 Rohde, R., Muller, R., Jacobsen, R., Perlmutter, S., Rosenfeld, A., Wurtele, J., Curry, J., Wickhams, C. and Mosher, S., 2013. Berkeley Earth Temperature Averaging Process, Geoinformatics and Geostatistics: An Overview 1(2), 20-100.
• 50 Schwarz, G.E., 1978. Estimating the dimension of a model. Annals of Statistics 6, 461–464.
• 51 Seidel, D.J., Lanzante, J.R., 2004. An assessment of three alternatives to linear trends for characterizing global atmospheric temperature changes. Journal of Geophysical Research: Atmospheres 109, D14.
• 52 Steinman, B.A., Mann, M.E., Miller, S.K., 2015. Atlantic and pacific multidecadal oscillations and northern hemisphere temperatures. Science 347, 988–991.
• 53 Stocker, T.F., Qin, D., Plattner, G.K., Tignor, M., Allen, S.K., Boschung, J., Nauels, A., Xia, Y., Bex, B., Midgley, B.M., 2013. IPCC, 2013: climate change 2013: The physical science basis. Contribution of Working roup I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change.
• 54 Swanson, K.L., Sugihara, G., Tsonis, A.A., 2009. Long-term natural variability and 20th century climate change. Proceedings of the National Academy of Sciences 106.38, 16120-16123.
• 55 Tol, R.S.J., De Vos, A.F. 1993 Greenhouse statistics - time series analysis. Theoretical and Applied Climatology, 48, 63–74.
• 56

Tol, R.S.J., De Vos, A.F. 1998. A Bayesian statistical analysis of the enhanced greenhouse effect. Climatic Change 38, 87-112.

• 57 Tollefson, J., 2014. Climate change: The case of the missing heat. Nature 505, 276–278.
• 58 Tollefson, J., 2016. Global warming “hiatus” debate flares up again. Nature. doi:10.1038/nature.2016.19414.
• 59 Tomé, A.R., Miranda, P.M.A., 2004. Piecewise linear fitting and trend changing points of climate parameters. Geophysical Research Letters 31(2).
• 60 Trenberth, K.E., Fasullo, J.T., 2013. An apparent hiatus in global warming? Earth’s Future 1, 19–32.
• 61 Wolter, K., Timlin, M.S., 1998. Measuring the strength of ENSO events: how does 1997/98 rank?. Weather 53, 315-324.
• 62 Wu, Z., Huang, N.E., Wallace, J.M., Smoliak, B. V., Chen, X., 2011. On the time-varying trend in global-mean surface temperature. Climate Dynamics 37, 759–773.