1 Introduction
Significant advances have been made in documenting how global and hemispheric temperatures have evolved and in learning about the causes of these changes. On the one hand, large efforts have been devoted to investigate the time series properties of temperature and radiative forcing variables (GayGarcia et al., 2009; Kaufmann et al., 2006; Mills, 2013; Tol and de Vos, 1993). In addition, a variety of methods were applied to detect and model the trends in climate variables, including features such as breaks and nonlinearities (Estrada, Perron and MartínezLópez, 2013; Gallagher et al., 2013; Harvey and Mills, 2002; Karl et al., 2000; Pretis et al., 2015; Reeves et al., 2007; Seidel and Lanzante, 2004; Stocker et al., 2013; Tomé and Miranda, 2004). Multivariate models of temperature and radiative forcing series provide strong evidence for a common secular trend between these variables, and help to evaluate the relative importance of its natural and anthropogenic drivers (Estrada, Perron and MartínezLópez, 2013; Estrada, Perron, GayGarcía and MartínezLópez, 2013; Kaufmann et al., 2006; Tol and Vos, 1998). The methodological contributions of the econometrics literature to this field have been notable; e.g., Dickey and Fuller (1979), Engle and Granger (1987), Johansen (1991), Perron (1989, 1997), Bierens (2000), Ng and Perron (2001), Kim and Perron (2009), Perron and Yabu (2009), among many others, see Estrada and Perron (2014) for a review. Regardless of the differences in assumptions and methods (statistical or physical), there is a general consensus about the existence of a common secular trend between temperatures and radiative forcing variables.
Some of the most relevant questions about the attribution of climate change are concerned with understanding particular periods in which the rate of warming changed; e.g., rapid warming, slowdowns or pauses. An example is the apparent slowdown in warming since the 1990s, for which various methods were applied to document or reject its existence (e.g., Fyfe et al., 2016; Lewandowsky et al., 2015, 2016). These issues are delicate to handle when dealing with observed temperature series, as the object of interest is the secular trend behind the observed warming. This underlying trend is affected by lowfrequency natural variability (Swanson et al., 2009; Wu et al., 2011) and changes in its rate of warming are difficult to detect and attribute. New tests and approaches to investigate common features in temperature and radiative forcing can make attribution studies more relevant for climate science and policymaking by providing a better understanding of the drivers behind them.
One way to tackle this problem is to devise procedures to extract the common secular trend between temperature and radiative forcing series and framing the problem in a univariate context where the available structural change tests can be applied. Estrada and Perron (2016) used this approach to investigate the existence and causes of the current slowdown in the warming. Their results strongly suggests a common secular trend and common breaks, largely determined by the anthropogenic radiative forcing. Their analysis is based on the results of cotrending and principal component analyses to separate the common longterm trend imparted by radiative forcing from the natural variability component in global temperature series. As they discussed, filtering the effects of physical modes of natural variability from temperature series is necessary to obtain a proper assessment of the features and drivers of the warming trend. This problem has seldom been addressed within the timeseries based attribution literature (e.g., Estrada, Perron and MartínezLópez, 2013; Estrada and Perron, 2016) and it constitutes a relevant research topic that requires the development of new procedures and techniques. What transpires from this research is that temperatures and radiative forcing are most likely characterized by a linear trend with two changes in the rate of growth. The first occurs in the early 60s and indicates a very large increase in the rate of growth of both temperatures and radiative forcing. This was termed as the “onset of sustained global warming”. The second is related to the more recent socalled “hiatus” period, which suggests that temperatures and total radiative forcing have increased less rapidly since the mid90s compared to the larger rate of increase from 1960 to 1990.
The approach we take consists in designing statistical tests in a multivariate setting with jointsegmented trends to investigate the existence of common breaks and the “hiatus”. There are two issues that until now remain unresolved, which this approach can address. The first is whether the breaks in the slope of the trend functions of temperatures and radiative forcing are common. This is important because common breaks coupled with the basic science of climate change would strongly suggest a causal effect from anthropogenic factors to temperatures. As is well known, a common linear trend can occur when the series are spuriously correlated. A common break can eliminate such concerns of spurious correlation and foster the claim that the relationship from radiative forcing to temperatures is causal. Hence, our paper first aims at developing a test for common breaks across a set of series modeled as joint segmented trends with correlated noise. The theoretical framework follows Qu and Perron (2007) and Oka and Perron (2016), building on Bai and Perron (1998, 2003). Their framework, however, precludes joint segmented trends since the regressors are functions of the break dates, which makes the problem very different. Perron and Zhu (2005) show that the limit distribution of the estimate of the break date in a single time series with a joint segmented trend follows a normal limit distribution. We build on that result to show that our common break test follows a chisquare distribution. Theoretically, our results are more general and cover a wide range of cases to test hypotheses on the break dates in a multivariate system, within or across equations. Although our model with a single equation is similar to that in Perron an Zhu (2005), our work has a different focus, namely testing, while Perron and Zhu (2005) deals with the distribution theory of the estimate. We are not aware of any test in the literature that can handle a set of general linear restrictions on break dates both across and within equations. Furthermore, our test for an additional break is more closely related to Perron and Yabu (2009), and the studies cited therein. The attempt to increase the power of structural break tests using cross equation correlation is also new in the case of jointsegmented trends. Our results show that, once we filter temperatures for the effect of the Atlantic Mutidecadal Oscillation (AMO) and the North Atlantic Oscillation (NAO) for reasons explained in the text, the breaks in the slope of radiative forcing and temperatures are common, both for the large increase in the 60s and the “hiatus” period.
The second issue relates to establishing formally via a proper testing procedure that takes into account the noise in the series, whether there was a “hiatus” period for temperatures since the mid90s. This is important because such a test would counter the widely held view that the “hiatus” is the product of natural internal variability (Kosaka and Xie, 2013; Trenberth and Fasullo, 2013; Meehl et al., 2011; Balmaseda, Trenberth and Källén, 2013). Using standard univariate tests (e.g., Perron and Yabu, 2009), the results are mixed across various series and sometimes borderline. Our aim is to provide tests with enhanced power by casting the testing problem in a bivariate framework involving temperatures and radiative forcing. Our results indicate that indeed the “hiatus” represents a significant slowdown in the rate of increase in temperatures, especially when considering global or southern hemisphere series, for which our test points to a rejection of the null of no change for all data sources.
We consider a multivariate system with equations where the dependent variables are modeled as jointsegmented trends with multiple changes in the slope. The errors are allowed to be serially correlated and correlated across equations. Of interest is testing for general linear restrictions on the break dates including testing for common breaks across equations. The test used is a (quasi) likelihood ratio test assuming serially uncorrelated errors, in which case it has a pivotal chisquare distribution. However, it is nonpivotal in the general case of interest. Accordingly, we also consider a corrected Wald test with a pivotal limit distribution, which can be constructed using break dates estimated one equation at a time, labelled OLSWald, or estimated via the complete system, labelled GLSWald. The latter can offer more efficient estimates when there is correlation in the errors across equations. It is, however, computationally demanding as it requires least squares operations of order where is the sample size and is the total number of breaks. The OLSWald test requires only leastsquares operations of order where () is the number of breaks in the equation, making it much easier to compute in a multivariate system. Our simulation study shows that the two Wald tests have finite sample sizes close to the nominal size when the extent of the serial correlation in the noise is small. They, however, suffer from potentially severe liberal size distortions for moderate to strong serial correlation. Hence, for all three tests (since the LR is nonpivotal), we suggest a bootstrap procedure to obtain the relevant critical values. These bootstrap tests are shown to have exact sizes close to the nominal 5% level in all cases considered.
We also propose a test for the presence of an additional break in some series. For instance, in our applications, the null and alternative hypotheses specify a break in the slope of the radiative forcing series (for which a rejection is easy using a single equation test), while under the null hypothesis there is no break in the temperature series but there is one under the alternative. Hence, the aim is to see whether a break in temperatures is present using a bivariate system. The theoretical framework is more general and allows multiple breaks in a general multivariate system. The test considered is a quasilikelihood ratio test, whose limit distribution is nonpivotal and depends in a complex way on a number of nuisance parameters. Hence, we resort to a bootstrap procedure to obtain the relevant critical values.
The paper is structured as follows. Section 2 presents the statistical model and the assumptions imposed. Section 3 considers testing for linear restrictions on the break dates including as a special case testing for common breaks across equations. The various tests and their limit distributions are discussed. Section 4 presents the test for the presence of an additional break in some series within a multivariate system. Section 5 presents simulation results about the size and power of the tests and introduces the bootstrap versions recommended. Section 6 presents the applications. The data are discussed in Section 6.1. Section 6.2 presents the results for the common breaks tests, while Section 6.3 addresses the issue of testing for the “hiatus” using a bivariate system with a temperature and a radiative forcing series. Brief conclusions are offered in Section 7. All proofs are in an appendix.
2 Statistical Model and Assumptions
We consider the following model with multiple breaks in a system of variables. Each variable consists of a linear trend with multiple changes in slope with the trend function is joined at each break date. More specifically, there are breaks in the slope of the variable, and the variate system is:
(1) 
for and , where , is the indicator function of the event and is the break date for the change (with magnitudes ) in the trend of the
variable. The vector of break dates for the
variable is and is the vector of break dates for the entire system, . We let denote the total number of breaks in the system. Hence, is an vector. For the variable, we have in matrix form(2) 
where and with , and . The entire system in matrix notation is
(3) 
where , , and . We are interested in the case where the number of breaks for each variable is known but the break dates are unknown. For a vector of generic break dates, we will use the notation without a superscript, that is, and . Also, we will simply write to denote the regressor matrix corresponding to break dates specified by , that is, .
To motivate our quasilikelihood ratio test, we first assume that is multivariate normal with 0 mean and covariance . This assumption will be relaxed when we derive the asymptotic distribution of our test. For a generic break date vector and the corresponding regressor matrix , the loglikelihood function of is then given by
Let and be the maximum likelihood estimators for and , which jointly solve and , where . Thus, the maximized the loglikelihood function is
(4) 
It is useful to work with break fractions; hence define , and , with the true break fractions , and equivalently defined. With these definitions, functions of such as and will be used interchangeably with and
. We now state the assumptions needed for the asymptotic analysis.
Assumption 1
with for .
Assumption 2
for and .
Assumption 3
Let . Then, is stationary with and . In addition, and , where “” denotes weak convergence under the Skorohod topology, is the dimensional standard Wiener process and .
Assumptions 1 and 2 are standard and simply state that the break dates are asymptotically distinct (i.e., each regime increases proportionally with the sample size ) and the changes in the parameters are nonzero at the break dates. Assumption 3 states that
is a stationary process and its partial sum follows a functional central limit theorem.
3 Testing for Linear Restrictions on the Break Dates
We now consider the following null and alternative hypotheses:
(5) 
Some examples of this type of hypotheses are the following: 1) specific break fractions: and ; 2) break fractions with a fixed distance: and ; 3) common breaks: and . For the first case, historic events can be used to determine the value of
. The second case can be used across equations as well as within an equation. The third case can be viewed as a special case of the second one and should be used across equations only. The LR test statistic is
Our model with broken segmented trends at unknown dates is a rare case in structural change problems with the estimates of the break dates having an asymptotic normal distribution; see Perron and Zhu (2005). Since the likelihood ratio test essentially involves quadratic forms of these estimates, it has the usual
distribution. When the shortrun and longrun variances differ
, the asymptotic distribution depends on nuisance parameters. It is difficult to directly modify the LR test. Instead, we develop a Wald test for the hypotheses (5) robust to serial correlation in the errors. To express the limiting distribution, define and , with being the positive part of the argument. Also, , , and , with all integrals from 0 to 1. Denote by the element of and by that of . Now, defineWe also define , , , , and equivalently, as well as
where with and . We now provide the limiting distribution of the estimate of the break fraction that maximizes the loglikelihood (4), which allows us to obtain the limit distribution of the Wald test.
Theorem 2
The total number of regressions required to obtain the above break date estimator is . This can pose a considerable amount of computational burden especially if the procedure is to be bootstrapped. Hence, it is worthwhile to devise a test assuming diagonality in because the break dates are then estimated separately from each variable and the total number of regressions required is only . While this modification lessens computational cost significantly, the resulting break date estimators can be less efficient due to the neglected correlation between variables in the system. Suppose the break dates are estimated separately for each variable from (2). Thus, and
(6) 
where , with . The limiting distribution of is obtained as a special case of Theorem 2 by letting . The limiting covariance can be simplified somewhat. Define
where
with and is the element of defined in Assumption 3.
Corollary 1
The result in (i) coincides with one result reported in Perron and Zhu (2005) when there is only one equation with one break.
4 Break Detection
We consider a test for the null of breaks in the system against the alternative of breaks with the location of the additional break unspecified. We again consider a quasilikelihood framework. Suppose that the model is now given by
where for some and is an vector with in the element and elsewhere. The null and alternative hypotheses are now:
(7) 
The maximum of the likelihood function under is again given by (4) for a generic break date vector . Similarly, the maximum of the likelihood function under is:
for a generic break date vector and an additional break date . The notation and is used to indicate the fact that an additional break is inserted in the equation. This additional break can only occur in one of the equations and we assume for simplicity that the relevant equation is known. This assumption is relaxed below. Just like we have defined and used and interchangeably, we define and will use and interchangeably. The test statistic we consider is given by
where and
As is well known in the structural break literature, is not identified under the null hypothesis and must be restricted to ensure a nondivergent limiting distribution of the test statistic. The set imposes the relevant restrictions.^{1}^{1}1Instead of using under both and , it is possible to jointly minimize under . However, this requires a more complex asymptotic analysis and we will focus on the simpler case. When the equation with an additional break is not specified a priori, the statistic can be extended to be
In order to express the limiting distribution, we need to define additional terms. Let and be as defined before. Let , , and where all integrals are taken from 0 to 1. Recall that is the element of and is that of , and let , , , , as well as and . Now, using these functions, we define
where , , , and are as previously defined. Finally, let be a mean zero Gaussian process defined over the unit interval with
Also, denote the limit counterpart of the set by , replacing by . The limit distribution of the tests are stated in the following theorem.
The limiting distributions depend on various nuisance parameters. First, all of the true break fractions for and matter via terms such as , , , , , , and . Second, both the shortrun and longrun variances matter via the various and matrices. Third, the trimming parameter in the set matters. Given the complexity of the limiting distributions, we neither seek a way to eliminate nuisance parameters nor attempt to tabulate critical values. Instead, we resort to using a bootstrap method to generate pvalues. The various terms are evaluated in a manner similar to that of the tests for common breaks described in the next section. Note that if the shortrun and longrun variances coincide and the true break fractions are used instead of their estimates, reduces to be .
5 Monte Carlo Simulations
In this section we provide Monte Carlo simulation results to assess the adequacy of the asymptotic distributions derived in Sections 3 and 4. All results are based on 1,000 replications. We focus on the test for common breaks as the tests for an additional change in slope are too computationally demanding so that a reasonable simulation experiment is prohibitive. The data generating process (DGP) is specified by:
The results are exactly invariant to the values of and . For the slope change parameters and , the cases of 0.5, 1.0 and 1.5 are considered. The error terms are such that
and , with . The parameter stands for the correlation across equations and stands for the autoregressive parameter in each of the error terms. We consider (0.5, 0, 0.5) for the values of and (0, 0.3, 0.7) for the values of . What influences the precision of the break date estimate is the ratio between the break magnitude and the longrun variance. In our simulations, the longrun variance is always set to be unity. The sample size is set to and the true break dates are at midsample, . For each generated data, we test four null hypotheses: (1) and , (2) and , (3) and and (4) and
. Hence the null rejection probabilities for (1) and (3) pertain to the finite sample sizes (with a nominal size 5%) while those for (2) and (4) pertain to powers. In testing these hypotheses, we consider the LR and Wald tests based on the SUR system (Theorems
1 and 2) as well as the Wald test based on estimating the break dates equation by equation using OLS regressions (Corollary 1). We refer to the two Wald tests as the GLSWald and OLSWald, respectively. For both versions, the longrun variance is estimated using a quadratic spectral window, where the bandwidth parameter is selected using Andrews’ (1991) data dependent method with an AR(1) approximation.We also simulate the bootstrap version of the aforementioned tests. To obtain bootstrap pvalues, we first estimate break dates imposing the null hypothesis and remove the estimated trend. Then, we fit a VAR(1) on the resulting residuals. Following Kilian (1998), we compute the biascorrected estimates of the VAR coefficients and obtain the corresponding residuals of the VAR. From them, we construct a pseudo VAR process and add to it the estimated trend functions to obtain a bootstrap sample. Simulating bootstrap tests requires a lot of computing time especially for those based on the full system. We make use of the WarpSpeed method suggested by Giacomini, Politis and White (2013). Also, we use the feasible GLS estimate instead of the MLE in the construction of the LR and GLSWald tests.
In Table 1, the results for the LR test are reported. Regarding the finite sample size obtained using the asymptotic critical values, two features emerge. First, the size is near or below the nominal level when , but it climbs up quickly as increases. In the worst case, it can go up to 50%. This result is well expected since the asymptotic validity of the LR test holds only when . Second, the size inflation is more evident when is small. In fact, when is very large, the estimated break dates coincide with the true ones making the test statistic literally zero with a large probability. However, this type of conservativeness is of little concern because the power function does not appear to be decreasing as gets large. On the other hand, the finite sample size obtained by the bootstrap procedure stays near the nominal 5% level across all simulation designs, offering significant improvements over the asymptotic test. The bootstrapped test gives slightly smaller power than the asymptotic test, but the difference is marginal especially in view of the large improvement in size.
Table 2 reports the results for the GLSWald test. Overall, the GLSWald test using the asymptotic critical values are more liberal than the corresponding LR test, which is a well known characteristic of Wald tests. Despite being liberal, the GLSWald is less sensitive to the value of , since we are using a robust covariance matrix. The GLSWald test also gets conservative as gets large for the same reason as the LR test. The bootstrapped GLSWald test controls the size as well as the bootstrapped LR test. Table 3 shows the results for the OLSWald test. It performs similarly to the GLSWald test for both the asymptotic and bootstrap versions. Lastly, the power is close to one in almost all cases. The hypotheses that are rejected misspecifies the break dates by only 5% of the sample size, yet our tests are powerful enough to detect them. To sum up, all three tests, the LR, GLSWald, and OLSWald suffer from size distortions if the breaks are not large enough. However, the bootstrap procedures significantly help control the size without losing much power.
6 Applications
We investigate the commonality of the break dates across the temperature and anthropogenic total radiative forcing series as well as test whether the recent “hiatus” is significant. We use two sets of global, northern and southern hemispheric temperature series, each of which will be denoted as G, N and S, respectively. The first set comes from the Climate Research Unit’s HadCRUT4 (Morice et al., 2012) and the second from the NASA database (GISTEMP Team, 2015; Hansen et al., 2010). For global temperatures, we also use the data from Berkeley Earth (Rohde et al., 2013) and the dataset in Karl et al. (2015). We first discuss the data used, then the results for the common breaks tests and finally the results for the test on a change in slope in temperatures related to the socalled “hiatus” period.
6.1 The data
The annual temperature data used are from the HadCRUT4 (18502014) (http://www.metoffice. gov.uk /hadobs/hadcrut4/data/current/download.html) and the GISSNASA (18802014) datasets (http://data.giss.nasa.gov/gistemp/). The Atlantic Multidecadal Oscillation (AMO) and the North Atlantic Oscillation (NAO) series (18562014) are from NOAA; http://www.esrl .noaa.gov/psd/data/timeseries/AMO/ and http://www.esrl.noaa.gov/psd/ gcos_wgsp/ Timeseries/NAO/). As stated above, for global temperatures, we also use the data from Berkeley Earth (Rohde et al., 2013) and the dataset in Karl et al. (2015). We also use series from databases related to climate model simulations by the Goddard Institute for Space Studies (GISSNASA). The radiative forcing data obtained from GISSNASA (https://data.giss.nasa.gov/ modelforce/; Hansen et al., 2011) for the period 18802010 include the following (in W/m2): wellmixed greenhouse gases, WMGHG, (carbon dioxide, methane, nitrous oxide and chlorofluorocarbons); ozone; stratospheric water vapor; solar irradiance; land use change; snow albedo; stratospheric aerosols; black carbon; reflective tropospheric aerosols; and the indirect effect of aerosols. The aggregated radiative forcing series are constructed as follows: WMGHG is the radiative forcing of the wellmixed greenhouse gases and has a largely anthropogenic origin; Total Radiative Forcing (TRF) is WMGHG plus the radiative forcing of ozone, stratospheric water vapor, land use change; snow albedo, black carbon, reflective tropospheric aerosols, the indirect effect of aerosols and solar irradiance.
6.2 Common breaks tests
The temperature series are affected by various modes of natural variability such as the Atlantic Multidecadal Oscillation (AMO) and the North Atlantic Oscillation (NAO), which are characterized by low frequency movements (Kerr, 2000; Hurrell, 1995). Since trends and breaks are low frequency features, it is important to purge them from the temperature series allowing more precise estimates of the break dates.^{2}^{2}2We also considered filtering the effect of the Pacific Decadal Oscilliation (PDO) index (https://www.ncdc.noaa.gov/teleconnections/pdo/) and the Southern Oscilliation Index (SOI) (http://www.cru.uea.ac.uk/cru/data/soi/); see Wolter and Timlin (1998) and Mantua and Hare (2002). The results were qualitatively similar showing robustness. Hence, we do not report them. Other high frequency fluctuations in temperature series do not affect the precision of the estimates of the break dates and the magnitudes of the changes in slope. Accordingly, we filter out the effect of these modes of variability by regressing each temperature series on these modes and a constant. Since the effect of natural variability might have occurred with a time lag, we choose an appropriate lag using the Bayesian Information Criterion (BIC); Schwarz (1978). The candidate regressors for the filtering are the current value and lags (up to order ) of AMO and NAO. We first work with G from HadCRUT4. We start with , so that the candidates are the current value and the first lag only. BIC chooses the current value of AMO and the first lag of NAO. Since the maximum lag allowed is selected, we increase to 4. Then, BIC chooses the current value of AMO and the second lag of NAO (the first is not included given that we search over models not necessarily including all lags up to some chosen order). When applying the BIC, the number of observations used is limited by (e.g., Perron and Ng, 2005). Having decided on the current value of AMO and the second lag of NAO, we apply the filtering to all available observations, not limited by . We could repeat the same procedure to each series. However, it does not make much sense to have the same mode affect each temperature series with a different lag. Hence, we filter all series with the current value of AMO and the second lag of NAO.^{3}^{3}3If we choose the filtering regressors for each series, G and SH from NASA and G from Berkeley require the fourth lag of NAO instead of the second. But the break date estimates change by two years at most. The filtered temperature series are denoted as G̃, Ñ and S̃. Figure 1 presents graphs of the original and filtered series.
For the radiative forcing variables, we use WMGHG and TRF. Figure 2 displays the time plots of WMGHG and TRF as well as the AMO and NAO. The data set used covers the period 18802014 and the filtering is done with the full sample. We use the data for 19002014 for the purpose of the common break date tests.^{4}^{4}4
For the radiative forcing variables, the data is available only up to 2011. We use forecast values for 20122014 based on an autoregressive model with a broken linear trend. In the case of TRF the forecast is produced using an AR(2) and a broken linear trend with 1960 and 1991 as the break dates. For WMGHG, we use an AR(8) and and a broken linear trend with break dates at 1960 and 1994.
We use two sample periods: 19001992 and 19632014. The reason we split the sample is to reduce the number of breaks in the system, thereby lessening the computational burden. Our choice of sample periods is made with the intention to minimize any unwanted effect from a second break that might exist in any of the variables in the system. The break dates in WMGHG are estimated most precisely since it has a very small noise component (see Figure 2) and they are 1963 and 1992. First we consider bivariate systems with one temperature series and one radiative forcing series. The null hypothesis of a common break date is tested using the LR test from Theorem 1. Since this test is not asymptotically pivotal, we supply bootstrap pvalues as well. The bootstrap is carried out in the same way as done in the Monte Carlo simulations reported earlier. We also report the GLSWald test from Theorem 2. Although the GLSWald test is asymptotically pivotal, we still supply the bootstrap pvalues because of the size distortions observed in our Monte Carlo study. In addition to the 4 global, 2 northern hemispheric and 2 southern hemispheric temperature series, we also consider the average of the global, northern and southern hemispheric temperature series across different datasets. Hence, there are 11 temperature series and 11 filtered temperature series, each of which is paired with either WMGHG or TRF, yielding a total of 44 bivariate systems.Table 4 (a) reports the results for the sample period of 19001992 when the original temperature series are considered; the asymptotic pvalues of the Wald and LR test are very small in most cases and the null of a common break date is rejected with only a few exceptions. However, the bootstrap pvalues are well over the conventional thresholds 0.05 or 0.01 in all cases. Given the previous simulation results, we view the bootstrap pvalues as more reliable than the asymptotic ones. Note that the break date for the radiative forcing variable is almost unchanged at either 1963 or 1966, regardless of the temperature series in the bivariate system. Also, the estimated common break date, i.e., the estimate imposing the null of common breaks, always coincide with the break date for the radiative forcing variable. On the other hand, the break date estimates for the temperature series are quite spread out. Hence, the break dates in the radiative forcing variables are being estimated more precisely than those in the temperature series. The large pvalues produced by the bootstrapping is a consequence of the high level of noise in the temperature series relative to the magnitude of the break. Accordingly, we fail to reject the null of a common break date despite the large discrepancy in the estimated break dates. Table 4 (a) also shows the results obtained with the filtered temperature series. The picture changes quite dramatically. The break dates for the radiative forcing variables are still at 1963 or 1966, but those for the temperature series are now also around 1960. Especially, when TRF enters the bivariate system, the estimate of the break date is 1960, 1965 or 1966, the exception being the southern hemispheric temperature from HadCRUT4 and the average of the two southern hemispheric temperature series, for which the estimate is still 1956. As a result, both the asymptotic and bootstrap pvalues are much larger than those for the original temperature series.
Tables 4 (b) presents the results for the second sample 19632014 when the original temperature series are used and the estimates of the break dates for the temperature series are again quite spread out, while those for the radiative forcing variables are consistently around 19901992. Just like in Table 4 (a), the asymptotic pvalues suggest strong rejection of the null hypothesis, but the bootstrap pvalues turns out much larger reflecting again the high level of correlated noise in the original series. Note that the estimates of the break dates for the filtered temperature series are 19901992 with only a few exceptions. Also, the pvalues are larger than the corresponding ones for the original temperature series.
We repeat the analysis for the bivariate systems with the OLSWald test as a robustness check and report the results in Table 5. The results are very similar when the filtered series are considered. The break date estimates obtained equation by equation coincide with those obtained from the system or differ only by a few years at most. When the original series are used, the break date estimates for some series largely deviate from those computed by the system method. However, the fact that the bootstrap pvalues are much larger than usual thresholds still applies. Thus, our conclusions remain the same.
An advantage of the OLSWald test is that it can handle a larger system without extra computational burden. The break date estimates will not change if we consider a larger system, because they are computed equation by equation. However, the longrun variance estimate and the bootstrap pvalues can change as we consider a bigger system. We consider three fivevariables systems. Each system has a global, northern and southern hemispheric temperature and the two radiative forcing variables. The first one uses the temperature series from HadCRUT4, the second one from NASA and the third one has average global, northern and southern hemispheric temperature series. The results are reported in Table 6. They show results broadly similar to those obtained bivariate systems.
To provide a final robustness check, we consider testing jointly whether the two breaks near 1960 and for the “hiatus” are common. We use the OLSWald test, which estimates the two breaks jointly for each equation, and the period 19262014 to focus on the time span whereby anthropogenic factors are most likely to be the main driver affecting changes in temperatures. The results are presented in Table 7. The results obtained are broadly similar and lead to the same conclusions. When using the filtered series and the bootstrap pvalues, the test fails to reject the null hypothesis of two common breaks. The first is located in the early 60’s (the onset of sustained global warming) and the second in the early 90’s (the “hiatus”) consistent with methods dealing with one break at a time presented earlier.
6.3 Testing for the “hiatus”
The literature on the existence and drivers of the current slowdown in the rate of warming has expanded quickly (Tollefson, 2014, 2016). In this section, we focus on testing if the “hiatus” can be explained by natural variability or if it is a feature of the underlying warming trend (Estrada, Perron and Martinez, 2013). A small part of the literature investigating the slowdown in the warming in global temperatures are based on formal structural change/changepoint tests (Cahill et al., 2015; Pretis et al., 2015), while others are simply based on testing for trends during arbitrarily selected periods of time (Cahill et al., 2015; Foster and Rahmstorf, 2011; Lewandowsky et al., 2015). This type of approach, as well as most of physicalbased studies, have typically failed to find convincing evidence for the existence of a change in the slope of global temperatures during the last decades. According to one of the commonly accepted hypothesis, the apparent “hiatus” is produced by the effects of lowfrequency natural variability that result from coupled oceanatmosphere processes and heat exchange between the ocean and the atmosphere. It has been proposed that effects of natural variability modes such as AMO, NAO and PDO were able to mask the warming trend since the 1990s, creating the illusion of a slowdown in the underlying warming trend (e.g., Guan et al ., 2015; Steinman et al., 2015; Li, Sun and Jin, 2013; Trenberth and Fasullo, 2013). However, Estrada and Perron (2016) argue that lowfrequency oscillations do distort the underlying warming trend but cannot account for the current slowdown. Instead, the main effect of these oscillations has been to make it more difficult to detect the drop in the rate of warming, a real feature of the warming trend imparted by the slowdown in the radiative forcing from wellmixed greenhouse gases.
Establishing whether the “hiatus” period is statistically significant is important because it would counter the widely held view that it is the product of natural internal variability (Kosaka and Xie, 2013; Trenberth and Fasullo, 2013; Meehl et al., 2011; Balmaseda, Trenberth and Källén, 2013). Using standard tests (e.g., Perron and Yabu, 2009), the results are mixed across various series and sometimes borderline. Our aim is to provide tests with enhanced power by casting the testing problem in a bivariate framework involving temperatures and radiative forcing. We consider bivariate systems with one temperature series and one forcing variable. We use the LR test for the presence of a break in temperature series given the presence of a structural break in radiative forcing series. We first estimate a break date in the radiative forcing series under the likelihood framework for a bivariate system of a pair of radiative forcing and temperature series. Then, given the estimated break date in the radiative forcing series, we apply the LR test for the null hypothesis of no break in temperatures versus one break. To obtain values, we apply the bootstrap procedure described in Section 5 with 1,000 samples. The results are presented in Table 8. First, as expected from previous results, the LR test is more likely to reject the null hypothesis of no break when using filtered temperature series. Hence, we report results only for that case. For the filtered series from 1900 to 1992, we reject the null at less than 5% significance level in all pairs of radiative forcing and temperature series. Hence, there is clear evidence of a break in temperatures that is near 1960 (varying between 1954 and 1966 depending on the series and the forcing). This concurs with the common break results obtained in the previous section. When using the sample 19632014 we reject the null in seven and eight cases out of eleven filtered series at the 10% significance level with wellmixed greenhouse gases and total radiating forcing, respectively. Since the sample size is small (, we might expect that the LR test may have little power, but our result suggests strong evidence for the presence of a break in temperature series. Note that the evidence for a break is stronger when using bivariate systems involving TRF. This is due to the fact that TRF exhibits a larger decrease in slope compared to WMGHG. The errors are also more strongly correlated. For instance, solar irradiance is a part of both TRF and temperatures and an important source of variations. When considering systems with TRF, the only pairs that do not allow a rejection at the 10% level are those associated with Northern hemisphere temperatures. For these, the pvalues range from .14 to .19, while for all other pairs they are below 10%. Given that the small sample size is small, the overall evidence strongly indicates a break in temperatures in the early 90s consistent with the presence of the muchdebated “hiatus”.
7 Conclusion
We consider a multivariate system with equations with the dependent variables modeled as jointsegmented trends with multiple changes in slope. The errors can be serially correlated and correlated across equations. We consider testing for general linear restrictions on the break dates, including testing for common breaks across equations. The test used is a (quasi) likelihood ratio test assuming serially uncorrelated errors. Under the stated conditions, the LR test has a pivotal chisquare distribution, though it is nonpivotal in general cases of interest. Hence, we also consider a corrected Wald test having a pivotal limit distribution, which can be constructed using break dates estimated one equation at a time or via the complete system. The limit distribution of the test is standard chisquare. However, simulations show that the two Wald tests suffer from potentially severe liberal size distortions for moderate to strong serial correlation. Hence, for all three tests, we suggest a bootstrap procedure to obtain the relevant critical values. These bootstrap tests are shown to have correct size in all cases considered. Our empirical results show that, once we filter the temperature data for the effect of the Atlantic Mutidecadal Oscillation (AMO) and the North Atlantic Oscillation (NAO), the breaks in the slope of radiative forcing and temperatures are common, both for the large increase in the 60s and the recent “hiatus”.
We also consider a test for the presence of an additional break in some series. The theoretical framework is general and allows multiple breaks in a general multivariate system. The test considered is a quasilikelihood ratio test. The limit distribution is shown to be nonpivotal and depends in a complex way on a number of nuisance parameters. Hence, we use a bootstrap procedure to obtain relevant critical values. In our applications, the null and alternative hypotheses specify a break in the slope of the radiative forcing series, while under the null hypothesis there is no break in the temperature series but there is one under the alternative hypothesis. Our results indicate that indeed the “hiatus” represents a significant slowdown in the rate of increase in temperatures, especially when considering global or southern hemisphere series, for which our test points to a rejection of the null of no change for all data sources considered. The statistical results are of independent interest and applicable more generally beyond the climate change applications considered.
References
 1 Andrews, D.W.K., 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59, 817858.
 2 Bai, J., Perron, P., 1998. Estimating and testing linear models with multiple strucutral changes. Econometrica 66, 4778.
 3 Bai, J., Perron, P., 2003. Computation and analysis of multiple structural change models. Journal of Applied Econometrics 18, 122.
 4 Balmaseda, M.A., Trenberth, K.E., Källén, E., 2013. Distinctive climate signals in reanalysis of global ocean heat content. Geophysical Research Letters 40, 17541759.
 5 Bierens, H.J., 2000. Nonparametric Nonlinear Cotrending Analysis, with an application to interest and inflation in the United States. Journal of Business and Economic Statistics 18, 323–337.
 6 Cahill, N., Rahmstorf, S., Parnell, A.C., 2015. Change points of global temperature. Environmental Research Letters 10(8), 084002.
 7 Dickey. D.A., Fuller, W.A., 1979. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74, 427–431.
 8 Engle, R.F., Granger, C.W.J., 1987. Cointegration and error correction: representation, estimation and testing. Econometrica 55, 251276.
 9 Estrada, F., Perron, P., 2014. Detection and attribution of climate change through econometric methods. Boletín de la Sociedad Matemática Mexicana 20, 107136.
 10 Estrada, F., Perron, P., 2016. Extracting and analyzing the warming trend in global and hemispheric temperatures. Unpublished Manuscript, Department of Economics, Boston University.
 11 Estrada, F., Perron, P., GayGarcía, C., MartínezLópez, B., 2013. A timeseries analysis of the 20th century climate simulations produced for the IPCC’s Fourth Assessment Report. Plos One 8(3): e60017.
 12 Estrada F., Perron P., MartínezLópez B., 2013. Statistically derived contributions of diverse human influences to twentiethcentury temperature changes. Nature Geoscience 6, 10501055.
 13 Foster, G., Rahmstorf, S., 2011. Global temperature evolution 1979–2010. Environmental Research Letters 6(4), 044022.
 14 Fyfe, J.C., Meehl, G.A., England, M.H., Mann, M.E., Santer, B.D., Flato, G.M., Hawkins, E., Gillett, N.P., Xie, S.P., Kosaka, Y., Swart, N.C., 2016. Making sense of the early2000s warming slowdown. Nature Climate Change 6, 224–228.
 15 Gallagher, C., Lund, R., Robbins, M., Gallagher, C., Lund, R., Robbins, M., 2013. Changepoint Detection in Climate Time Series with LongTerm Trends. Journal of Climate 26, 4994–5006.
 16 GayGarcia, C., Estrada, F., Sanchez, A. 2009. Global and hemispheric temperature revisited. Climatic Change 94, 333349.
 17 Giacomini, R., Politis, D.N., White, H., 2013. A warpspeed method for conducting Monte Carlo experiments involving bootstrap estimators. Econometric Theory 29, 567589.
 18 GISTEMP Team, 2015. GISS surface temperature analysis (GISTEMP). NASA Goddard Institute for Space Studies. Dataset accessed at http://data.giss.nasa.gov/gistemp/
 19 Guan, X., Huang, J., Guo, R., Pu, L., 2015. The role of dynamically induced variability in the recent warming trend slowdown over the Northern Hemisphere. Scientific Reports 5, 12669.
 20 Hansen, J., Ruedy, R., Sato, M., Lo, K., 2010. Global surface temperature change. Reviews of Geophysics 48.4.
 21 Hansen, J., Sato, M., Kharecha, P., von Schuckmann, K., 2011. Earths energy imbalance and implications. Atmospheric Chemistry and Physics 11, 1342113449.
 22 Harvey, D.I., Mills, T.C., 2002. Unit roots and double smooth transitions. Journal of Applied Statistics 29, 675–683.
 23 Hurrell, J.W., 1995. Decadal trends in the North Atlantic Oscillation and relationships to regional temperature and precipitation. Science 269, 676679.
 24 Johansen, S., 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59, 15511580.
 25 Karl, T.R., Knight, R.W., Baker, B., 2000. The record breaking global temperatures of 1997 and 1998: Evidence for an increase in the rate of global warming? Geophysical Research Letters 27, 719–722.
 26 Karl, T.R., Arguez, A., Huang, B., Lawrimore, J.H., McMahon, J.R., Menne, M.J., Peterson, T.C., Vose, R.S., Zhang, H.M., 2015 Possible artifacts of data biases in the recent global surface warming hiatus. Science 348, 1469–1472.
 27 Kaufmann, R.K., Kauppi, H., Stock, J.H., 2006. Emissions, concentrations, & temperature: A time series analysis. Climatic Change 77, 249278.
 28 Kerr, R.A., 2000. A North Atlantic climate pacemaker for the centuries. Science 288, 19841985.

29
Kilian, L., 1998. Smallsample confidence intervals for impulse response functions. Review of Economics and Statistics 80, 218230.
 30 Kim, D., Perron, P., 2009. Unit root tests allowing for a break in the trend function at an unknown time under both the null and alternative hypotheses. Journal of Econometrics 148, 113.
 31 Kosaka. Y., Xie, S.P., 2013. Recent globalwarming hiatus tied to equatorial Pacific surface cooling. Nature 501, 403–407.
 32 Lewandowsky, S., Risbey, J.S., Oreskes, N., 2015. On the definition and identifiability of the alleged “hiatus” in global warming. Scientific Reports 5:16784.
 33 Lewandowsky, S., Risbey, J.S., Oreskes, N., 2016. The pause in global warming: Turning a routine fluctuation into a problem for science. Bulletin of the American Meteorological Society 97, 723733.
 34 Li, J., Sun, C., Jin, F.F., 2013. NAO implicated as a predictor of Northern Hemisphere mean temperature multidecadal variability. Geophysical Research Letters 40, 54975502.
 35 Mantua, N.J., Hare, S.R., 2002. The Pacific Decadal Oscillation. Journal of Oceanography 58.1, 3544.
 36 Meehl, G.A., Arblaster, J.M., Fasullo, J.T., Hu, A., Trenberth, K.E., 2011. Modelbased evidence of deepocean heat uptake during surfacetemperature hiatus periods. Nature Climate Change 1, 360–364.
 37 Mills, T.C., 2013. Breaks and unit roots in global and hemispheric temperatures: An updated analysis. Climatic Change 118, 745–755.
 38 Morice, C.P., Kennedy, J.J., Rayner, N.A., Jones, P.D., 2012. Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 dataset. Journal of Geophysical Research 117, D8.
 39 Ng, S., Perron, P., 2001. Lag length selection and the construction of unit root tests with good size and power. Econometrica 69, 15191554.
 40 Oka, T., Perron, P., 2016. Testing for common breaks in a multiple equation system. Unpublished Manuscript, Department of Economics, Boston University.
 41 Perron, P., 1989. The great crash, the oil price shock, and the unit root hypothesis. Econometrica 57, 13611401.
 42 Perron, P., 1997. Further evidence on breaking trend functions in macroeconomic variables. Journal of Econometrics 80, 355385.
 43 Perron, P., Ng, S., 2005. A note on the selection of time series models. Oxford Bulletin of Economics and Statistics 67, 115134.
 44 Perron, P., Zhu, X., 2005. Structural breaks with deterministic and stochastic trends. Journal of Econometrics 129, 65119.
 45 Perron P, Yabu T., 2009. Testing for shifts in trend with an integrated or stationary noise component. Journal of Business & Economic Statistics 27, 369396.
 46 Pretis, F., Mann, M.L., Kaufmann, R.K., 2015. Testing competing models of the temperature hiatus: assessing the effects of conditioning variables and temporal uncertainties through samplewide break detection. Climatic Change 131, 705–718.
 47 Qu, Z., Perron, P., 2007. Estimating and testing structural changes in multivariate regressions. Econometrica 75, 459502.
 48 Reeves, J., Chen, J., Wang, X.L., Lund, R., Lu, Q.Q., Reeves, J., Chen, J., Wang, X.L., Lund, R., Lu, Q.Q., 2007. A Review and Comparison of Changepoint Detection Techniques for Climate Data. Journal of Applied Meteorology and Climatology 46, 900–915.
 49 Rohde, R., Muller, R., Jacobsen, R., Perlmutter, S., Rosenfeld, A., Wurtele, J., Curry, J., Wickhams, C. and Mosher, S., 2013. Berkeley Earth Temperature Averaging Process, Geoinformatics and Geostatistics: An Overview 1(2), 20100.
 50 Schwarz, G.E., 1978. Estimating the dimension of a model. Annals of Statistics 6, 461–464.
 51 Seidel, D.J., Lanzante, J.R., 2004. An assessment of three alternatives to linear trends for characterizing global atmospheric temperature changes. Journal of Geophysical Research: Atmospheres 109, D14.
 52 Steinman, B.A., Mann, M.E., Miller, S.K., 2015. Atlantic and pacific multidecadal oscillations and northern hemisphere temperatures. Science 347, 988–991.
 53 Stocker, T.F., Qin, D., Plattner, G.K., Tignor, M., Allen, S.K., Boschung, J., Nauels, A., Xia, Y., Bex, B., Midgley, B.M., 2013. IPCC, 2013: climate change 2013: The physical science basis. Contribution of Working roup I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change.
 54 Swanson, K.L., Sugihara, G., Tsonis, A.A., 2009. Longterm natural variability and 20th century climate change. Proceedings of the National Academy of Sciences 106.38, 1612016123.
 55 Tol, R.S.J., De Vos, A.F. 1993 Greenhouse statistics  time series analysis. Theoretical and Applied Climatology, 48, 63–74.

56
Tol, R.S.J., De Vos, A.F. 1998. A Bayesian statistical analysis of the enhanced greenhouse effect. Climatic Change 38, 87112.
 57 Tollefson, J., 2014. Climate change: The case of the missing heat. Nature 505, 276–278.
 58 Tollefson, J., 2016. Global warming “hiatus” debate flares up again. Nature. doi:10.1038/nature.2016.19414.
 59 Tomé, A.R., Miranda, P.M.A., 2004. Piecewise linear fitting and trend changing points of climate parameters. Geophysical Research Letters 31(2).
 60 Trenberth, K.E., Fasullo, J.T., 2013. An apparent hiatus in global warming? Earth’s Future 1, 19–32.
 61 Wolter, K., Timlin, M.S., 1998. Measuring the strength of ENSO events: how does 1997/98 rank?. Weather 53, 315324.
 62 Wu, Z., Huang, N.E., Wallace, J.M., Smoliak, B. V., Chen, X., 2011. On the timevarying trend in globalmean surface temperature. Climate Dynamics 37, 759–773.
Comments
There are no comments yet.