In this paper, we introduce the bootUR package (Smeekes and Wilms, 2020) for R (R Core Team, 2017), which implements several bootstrap tests for unit roots. Unit root testing is an essential part of any statistical analysis of time series. Although unit root tests play some role in assessing particular economic hypothesis (such as in the classical study of Nelson and Plosser, 1982), the by far most important use of unit root tests is as a pre-test, to determine whether a time series is (non)-stationary, possesses a unit root. Proper handling of unit roots, and thereby knowing a time series’ order of integration, is of paramount importance before commencing any form of analysis on the time series of interest. Globally speaking, the bootUR package offers three major contributions to existing R packages. First, it offers a comprehensive, easy-to-use and reliable set of unit root tests not found as generally in other packages. Second, it offers accurate -values based on bootstrap methods. Third, its functions are not only directly applicable to single time series, but also to datasets consisting of a potentially large set of time series. With these contributions the bootUR package provides practitioners with a single source to fill their unit root testing needs.
Ignoring unit root tests essentially invalidates any subsequent statistical analysis: the stochastic trend, and associated non-decaying dependence of the present on the far past of a series, yields standard central limit theorems inapplicable. Probably the most famous consequence of ignoring unit roots is the ‘spurious regression phenomenon’, where one finds seemingly significant relations between unrelated time series with stochastic trends. These results have a long history, are well-established and extensively documented in the time series literature. A reader new to unit roots may, for instance, consultEnders (2008) for a classical textbook treatment of this spurious regression phenomenon as well as the more general problems associated with unit roots.
Given the well-known importance of unit root testing, it is surprising that relatively few R packages exist that allow for easy and comprehensive unit root testing. Moreover, unit root tests are scattered across several packages in the R environment for statistical computing and graphics. The most popular unit root test is the classical augmented-Dickey Fuller (ADF) test (Dickey and Fuller, 1979, 1981). Implementations of the ADF test are incorporated in various packages, in particular CADFtest (Lupi, 2009), fUnitRoots (Wuertz et al., 2017), tseries (Trapletti et al., 2019), and urca (Pfaff, 2008).111The mleur package (Zhang et al., 2011) also implements the ADF test, but links to urca for this purpose. The package uroot (López-de Lacalle and Boshnakov, 2019) used to have the ADF test implemented but it is no longer supported in the package’s current version, hence disregarded from the overview.
As we will argue in the next section, most ‘standard’ unit root tests, such as the ones implemented in these packages, require seemingly innocuous choices from the practitioner regarding model specifications or which test to use, that may have a major impact on the performance of the unit root tests. As its first major contribution, the bootUR package instead implements the user-friendly union of rejections principle (Harvey et al., 2009, 2012; Smeekes and Taylor, 2012) that relieves the user from the burden of having to choose the right specification and performs this task automatically.
Crucially, with the exception of the HEGY seasonal unit root test in the uroot package (López-de Lacalle and Boshnakov, 2019), current R implementations of unit root tests rely on asymptotic inference when returning critical values or -values for the unit root test.222Another exception is the repository URT (Mallet, 2017), available on GitHub, which includes bootstrap unit root tests. In the remainder, we only focus on packages that are currently maintained on the Comprehensive R Archive Network (CRAN). As is well known in the statistics literature, unit root tests are very sensitive to size distortions in smaller samples due to, for example, neglected serial correlation (Schwert, 1989). Size distortions due to features such as time-varying volatility even persist asymptotically (Cavaliere, 2005). As a consequence, unit root tests based on asymptotic or numerical -values (MacKinnon et al., 1999), which do not take the features of the specific time series into account, are quite unreliable in practice.
The ‘boot’ in bootUR stands for bootstrap since the unit roots tests we provide rely on various bootstrap methods for constructing
-values. The bootstrap approximates the exact distribution of the unit root test statistic by repeatedly drawing new samples from the original sample, thereby capturing the features of the time series of interest that affect the distribution of the test. This ensures that the bootstrap tests in bootUR have accurate size properties under very general conditions, which constitutes the second major contribution of our package.
Finally, most datasets contain multiple, sometimes even many, time series to be tested for unit roots, often leading practitioners to apply unit root tests to each time series separately. Such a practice does not only suffer from multiple testing issues, rejecting several tests by chance alone, but also disregards similarities between individual time series which, if exploited could increase the often limited power of the individual tests. Although some packages provide joint unit root tests for multivariate or panel data (pdR, Tsung-wu, 2019; plm, Croissant and Millo, 2008)333The packages PANICr (Bronder, 2016) and punitroots (Kleiber and Lupi, 2012) also provide panel unit root tests, but the former has been removed from CRAN and the latter is only available on R-Forge., such tests may increase power but do not allow one to determine the properties of individual series. For this goal, one would need tests that correct for multiple testing, whose implementations are currently lacking for unit root tests. Therefore, the third major contribution of bootUR is to implement easy tools for applying unit root tests to multivariate time series, with automatic multiple testing control.
With these contributions, the bootUR package provides a unified framework for easy and comprehensive unit root testing based on the following philosophy. 1) for novice users, the tests should be easy to implement with sensible default options; 2) those default options should lead to reliable and accurate unit root tests, applicable in general situations; 3) expert users, familiar with the unit root literature, should be able to easily tweak and adjust the tests to their desired settings; 4) all tests should be easily scalable to large datasets without additional effort by the user, thereby providing ‘automatic’ functionality.
To accomplish our philosophy, the package has a simple structure, yet it offers users a wide variety of unit root tests. In particular, unit root tests can directly be performed on single time series or multiple time series. To this end, we deliberately created separate functions that serve these purposes: the functions boot_df() and boot_union() can be used for single time series, iADFtest() for multiple time series without multiple-testing control, BSQTtest() and bFDRtest() for multiple time series with multiple-testing control, and the paneltest() function offers a panel unit root test. For each unit root test, the bootstrap method can be chosen by the end-user. To this end, all functions make use of the universal argument boot. Via suitable warning and error messages, user-friendly advise is provided on the (non)-applicability of certain bootstrap methods in certain situations. Finally, model specifications (such as deterministic components, lag length selection, de-trending methods) are either under the user’s full control, or implemented automatically according the union of unit root tests principle to ensure reliable tests across potentially heterogeneous series. Each function contains many options whose syntax is shared across the package, thereby facilitating usability and control by the end-user.
Finally, we have also added several functions, based around the core functions above, that aid in the practical implementation of the unit root tests. Most importantly, the function order_integration() provides an automatic way to determine the order of integration of each series in a dataset, based on a sequence of one of the aforementioned unit root tests. As it also directly outputs the correctly differenced time series that remove all stochastic trends, it provides the user with the option to conduct the entire unit root pre-analysis with a single command. Additionally, we provide several functions that easily allow the user to assess and visualize properties of the data and outcomes of the tests.
The package is available from the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=bootUR. In addition, the latest (development) version is available on GitHub at https://github.com/smeekes/bootUR. The core of the package is written in C++ with parallel execution offered by the OpenMP (Dagum and Menon, 1998) API to ensure scalability to large datasets. We make use of the packages Rcpp (Eddelbuettel and François, 2011; Eddelbuettel, 2013; Eddelbuettel and Balamuta, 2017) and RcppArmadillo (Eddelbuettel and Sanderson, 2014) to facilitate seamless integration with R. Version 0.2.0 of the bootUR package and version 4.0.2 of R were used in this paper.
Adhering to the four points of our philosophy not only requires thoughts on how to implement the tests and design the API, but it also requires a careful choice of the appropriate statistical methods. We therefore first consider the problem from a statistical point of view in Section 2, where we discuss the unit root test for single time series and multiple time series, and in Section 3, where we discuss the bootstrap methods. We then continue with the package’s implementation in Section 4. Section 5 uses two empirical applications to compare bootUR’s unit root functions to implementations in other R packages and illustrate its usefulness for practitioners. Section 6 concludes.
2 Unit Root Tests
We first discuss unit root tests for individual time series (Section 2.2), followed by testing multiple series for unit roots (Section 2.3). In our discussion, paralleling Smeekes and Wijler (2020), we do not focus on theory, but on the issues that arise for practitioners when implementing these tests on their time series. For a more extensive and theoretical overview of unit root testing, we refer the interested reader to Choi (2015).
2.1 Unit Roots
Consider the case where we have observations from a time series () generated according to the data generating process (DGP)
where are deterministic functions of time. In particular, three cases are commonly considered: (no deterministic components), (intercept only), and (intercept and linear trend). The error process is allowed to be serially correlated and heteroskedastic. The presence of serial correlation in has to be accounted for in inference. Typically, is modelled as an invertible infinite order linear process, for instance as
where is typically assumed to be a martingale difference sequence. This linearity motivates the use of adding lagged differences of the time series to account for the serial dependence, as in the classical augmented-Dickey-Fuller (ADF) test (Dickey and Fuller, 1979). However, Paparoditis and Politis (2018) show that ADF-type approaches are valid under much more general forms of dependence in .
We focus on testing whether or not contains a unit root, that is on testing
in equation (1
). Under the null hypothesis of a unit root,contains a stochastic trend, and equivalently is being said to be integrated of order 1 (), while the alternative postulates that is integrated of order 0 (), which is generally taken as synonymous to being stationary. Here ‘integrated of order ’ means that should be differenced times to achieve a process that does not contain a stochastic trend anymore.444Although stationary is generally used as synonym for , an
process can still be non-stationary, for instance through a shift in the variance. Despite this distinction, we follow tradition and use ‘’ and ‘stationary’ interchangeably.
2.2 Individual Unit Root Tests
To test the null hypothesis of a unit root, the classical ADF test (Dickey and Fuller, 1979, 1981) remains the pre-dominant choice in practice. For this reason it also forms the backbone of the bootUR package. However, even in its most basic form, practitioners are required to make several non-trivial choices that have a big impact on its performance. Table 1 summarizes these choices and indicates how the various R packages address each of them. In this section, we first discuss the ADF test and the choices that need to be made, before discussing the union of unit root tests principle proposed by Harvey et al. (2009, 2012) which alleviates many of the concerns.
The ADF -statistic is the most popular unit root test in practice. Let be the difference operator defined as . If no deterministic components are present, the ADF regression is given by
where the lagged differences of are added to the regression to capture the serial correlation present in . Testing the null of a unit root then boils down to testing the significance of the parameter in equation (2).
If the time series is suspected to have deterministic components as well, testing becomes more complicated. The traditional one-step procedure adds the relevant deterministic components directly in (2). However, this may easily lead to confusion on which components to include, as under the null of a unit root, the coefficient of the linear trend cancels out. This has led many to erroneously perform tests including an intercept only, or to perform joint tests on and the coefficient of the linear trend (as suggested by Dickey and Fuller, 1981), which, although correct, is unnecessary and complicates the development of a coherent framework. This one-step procedure is implemented in most R packages, see Table 1.
Instead, bootUR follows the two-step approach implemented by most modern unit root tests, such as the bootstrap tests considered in Section 3. Here, a first stage regression is run of on the deterministic components , and in the second stage the ADF regression
is run on the residuals of the first stage regression, , commonly referred to as the de-trended time series. The two-step procedure has the advantage that it disconnects the deterministic trend from the stochastic trend, which makes it easier to interpret. This procedure is implemented in the boot_df() function of the bootUR package, see Table 1.
The most straightforward choice to obtain the parameter
is by ordinary least squares (OLS), which is asymptotically equivalent to including the trend directly in the ADF regression. Alternatively, inspired by the work ofElliott et al. (1996) and their DF-GLS test, one can obtain by a generalized least squares (GLS) type of regression, where the (near-)unit root in is first removed by quasi-differencing (QD); the regression is then performed by OLS for on , where is a parameter that determines how close to differencing the GLS step is; (Elliott et al., 1996) recommend that for the case and for the case to yield tests with good power properties.555
To avoid confusion with a ‘proper’ GLS estimation that also takes into account higher-order serial dependence and heteroskedasticty, we refer to this test as the quasi-differenced (QD) test rather than GLS.The DF-GLS is often considered to be more powerful than the ADF test; it is therefore surprising that the DF-GLS test does not appear in many R packages, and in fact, it seems a version with limited functionality is only available in the package urca, see Table 1. The actual relation between the OLS and QD de-trended tests is more nuanced though. In particular, as shown by Müller and Elliott (2003) inter alia, the QD test is only more powerful if the initial condition, the deviation of the start of the time series from equilibrium, is small. When the initial condition is large, the standard OLS-detrended ADF test is considerably more powerful.
While both options are implemented in the function boot_df() for varying choice of , one should realize that the seemingly innocuous issue of including deterministic components presents the practitioner with two difficult choices: which deterministic components to include, and how to perform the de-trending. These choices can have a major impact on the performance. If too few deterministic components are included, deterministic trends are detected as stochastic trends, and the test becomes inconsistent. On the other hand, adding too many deterministic components reduces the power of the test considerably, and should also be avoided. As already noted by Campbell and Perron (1991), “A nonrejection of the unit root hypothesis maybe due to misspecification of the deterministic components included as regressors” (p. 152). Of course, the trend parameters are not observable which complicates the choice in practice. Typically, the choice whether to include a trend or not is based on visual inspection of the time series. However, trend detection based on a plot is clearly very prone to errors, and even influenced by the resolution and format of the plot. Yet, all unit root tests in current R packages ask the user to make a choice without providing any guidance. Not only does this assume the user knows how to make that choice, but also that the user has the opportunity and time to do this manually. The latter may be feasible for a handful of time series, but quickly becomes impossible for a modern high-dimensional dataset with perhaps hundreds of time series.
Similarly, the initial condition is unobservable, such that the user has to make an (un)educated guess as to which de-trending method to use (if the package allows for a choice at all). Given the large power differences between the methods, we believe that we may add to Campbell and Perron’s (1991) statement about deterministic components the following variation: “A nonrejection of the unit root hypothesis maybe due to the chosen de-trending method”.
The bootUR package is, to the best of our knowledge, the first R package which does not force the user to make these choices, but instead offers via the function boot_union() a data-driven alternative via the union of rejections principle introduced by Harvey et al. (2009, 2012). Before discussing this in detail though, we first turn to the third difficult choice a user has to make: selecting the lag length in equation (3).
The lag length choice concerns a trade off between size distortions incurred from including too few lags to capture all serial correlation, and power loss incurred from including too many lags. Although theory (and some R packages such as tseries) generally assume to be a deterministic function of the sample size, in practice a data-driven selection will clearly be more successful in managing a good trade off between size and power.
Popular choices for automatic data-driven lag length selection are information criteria and sequential tests. Sequential tests consider a sequence of -tests on the largest lag, starting from the largest model. If the coefficient is found to be insignificant, the lag is removed and the next model considered. In the bootUR package, we do not consider sequential testing as, in general, information criteria are more popular and accurate than sequential testing (cf. Cavaliere et al., 2015, Remark 3).
Information criteria trade off model fit (through the residual sum of squares) and overfitting (through a penalty on the number of parameters). The lag length is estimated as
where with the OLS residuals from the ADF regression with lag length in equation (3), and is a penalty function that differs according to the information criterion used. We consider two penalties: one corresponding to the Akaike information criterion (AIC; ) and the other to the Bayesian information criterion (BIC; ).
Next to the original criteria, bootUR also implements their modified variants proposed by Ng and Perron (2001). These modifications are specifically motivated for lag length selection in the ADF regression. They are given by
where . The lag length is then estimated as in (4), with replaced by . The modified AIC (MAIC) is obtained by taking , the modified BIC (MBIC), by taking . Ng and Perron (2001) show that the MICs yield large size improvements over the ICs for the purpose of unit root testing. Perron and Qu (2007) recommend to always use the MICs with the OLS rather than QD de-trended data (even if the unit root test itself makes use of QD de-trending) since this improves the test’s power properties; bootUR follows this recommendation. In addition, there are various seemingly minor aspects of how the lag selection is implemented that influence its performance, such as how many observations are used to calculate the residual sum of squares. Ng and Perron (2005) provide a detailed study and guidelines for these choices; bootUR implements the scheme they recommend as optimal.
Cavaliere et al. (2015) find that in the presence of heteroskedasticity, information criteria are affected, leading to less accurate choices of and consequent power loss of the unit root tests. They propose rescaled information criteria, where the time series
is rescaled with a nonparametric estimate of its (time-varying) standard deviation, thereby eliminating the heteroskedastictiy. The information criterion is then applied to this rescaled series. These rescaled ICs are generally more powerful in the presence of heteroskedasticity, yet very similar to the original ones without. bootUR therefore performs the rescaling by default since it is a safe choice and relieves the user of the burden to check whether heteroskedasticity is present.
Union of rejections test
As mentioned above, choosing the right deterministic components to include and the right de-trending method to use, is crucial to obtain tests with good power properties. However, making an informed, data-driven, choice is complicated. While deterministic trends can in principle be consistently detected, in practice a trend test will only detect large trends. Part of the problem is that such a test must be valid under both the unit root null and the stationary alternative - since we can only test for unit roots afterwards. The failure of such tests to detect trends means that based on such a pre-test one will often decide not to include a trend when it should have been included, which is the one scenario that must be avoided due to the test’s inconsistency. Detecting a large initial condition is even more complicated, so a reliable data-driven pre-test is not an option.
Harvey et al. (2009, 2012) take a different approach, based on a very simple principle. Roughly speaking, for both specification issues, we can choose between a powerful test and a not powerful test, although we do not know which one. A logical step would be to perform both tests and reject whenever one of them rejects the null hypothesis - the logic being that the one rejecting is then the powerful one. Of course, with two tests performed simultaneously, one must control for multiple testing and adjust the tests with a Bonferroni-type adjustment to control size at the desired level. Harvey et al. (2009) introduced this union of rejections idea for the two specification issues separately, while Harvey et al. (2012) combined the two approaches to consider a union of four tests – intercept only, or intercept with trend, in combination with OLS or QD de-trending – that guards against both uncertainty over the trend and the initial condition. While the size correction has a consequence that the union test is strictly less powerful than the optimal test, the power loss turns out to be small and this disadvantage is far out-weighted by the fact that the union test never breaks down unlike the individual tests.
This characteristic makes the union test a safe option for quick or automatic unit root testing where a careful manual specification setup is not viable, and makes it therefore very suitable for bootUR’s philosophy that the default option provides a reliable and accurate test, for which no in depth knowledge is needed about either the data or the applicability of various unit root tests. Moreover, it scales easily to large datasets with many series, where careful manual considerations about these specifications are not possible regardless of the expertise of the user.
The bootUR package is the first R package to implement this convenient union test. In particular, it implements the bootstrap version of the union test developed by Smeekes and Taylor (2012), which uses the bootstrap both for determining the appropriate size correction and for obtaining the test’s -values. The test statistic takes the form
where and are the ADF and QD de-trended tests, and superscript and respectively indicating whether the series are de-meaned or de-trended. The critical values are bootstrap-based and determined in a preliminary bootstrap step as the individual level critical values of the four tests. The variable is a scaling factor to which the statistics are scaled. Any suffices to preserve the left-tail rejection region. This bootstrap union test is made available through the function boot_union().
Finally, note that this union-based approach still requires one to select the lag lengths in each of the four ADF regressions. To this end, any of the four information criteria, AIC, BIC, MAIC and MBIC can be used.
Of course, various other unit root tests exist and are implemented in R, such as the Phillips and Perron (1988) (PP) test (urca), the seasonal HEGY test (uroot) and the KPSS (Kwiatkowski et al., 1992) stationarity test (fUnitRoots, tseries, urca). We intentionally do not implement those in bootUR to avoid overloading the user with choices that are not easy to justify. Many of such tests, such as HEGY or KPSS have different testing setups that require careful consideration, while others, such as the PP test, suffer from serious size distortions even the bootstrap cannot fix. Moreover, the ADF test is by far the most popular in practice, and therefore we feel including such tests would only confuse the user, and a better approach is to provide a simple, coherent and reliable testing structure instead. In this line we can also mention the covariate-augmented test of Hansen (1995) implemented in the CADFtest package, which exploits correlation with known stationary covariates to improve power. While this is interesting if one wants to test a single series and has a set of stationary covariates at hand, this approach is difficult to implement if one has a dataset in which all series need to be tested for unit roots, and hence these series are not available as covariates. In such a setting it makes more sense to pool the tests, as done by panel unit root tests discussed in the next section.
2.3 Multiple Unit Root Tests
Practitioners often make use of several time series in their analysis, and typically need to test all for unit roots. While performing a unit root test for each series separately is normal practice for a small number of time series, this becomes more complicated if the number of series is large. First, performing many unit root tests simultaneously suffers from multiple testing issues as the probability of incorrect classifications increases with the number of performed tests. Second, we would like to exploit the similarity between different time series to improve the power of the unit root tests, in particular if the time dimension is relatively small.
In the bootUR package, we consider three different ways to approach the testing problem with multiple time series. First, the simplest option of ignoring the test multiplicity issue by just performing unit root tests separately for each series. To this end, the function iADFtest() from bootUR can be used. While not very appealing from a theoretical point of view, there are practical reasons why one may still prefer this conceptually straightforward setup. We will explore this further in Section 4 when we elaborate on the package’s functions. Second, we consider the traditional approach of panel unit root tests, where on pools the information in all series to obtain a more powerful test. The function paneltest() offers such a test. Third, we can consider individual tests but then with appropriate control of multiple testing error rates. bootUR considers two such tests, namely BSQTtest() and bFDRtest().
Surprisingly, despite the large literature on this topic, software implementations for unit roots are mostly lacking. While there is some support for panel unit root testing as discussed hereafter, methods to control multiple testing in the context of unit root testing are, to the best of our knowledge, not available. While several general purpose multiple testing packages exist, using these in a proper way with unit root tests requires considerable effort and expertise from the user. For instance, some standard corrections may be overly conservative, such as the Bonferroni correction, or only applicable under specific conditions on the dependence, such as Benjamini and Hochberg’s (1995) method to control the false discovery rate. As argued by for instance Romano et al. (2008b), bootstrap methods for controlling multiple testing allow for general forms of dependence and avoid being too conservative. However, such bootstrap methods need to be integrated with the unit root testing, which is the approach taken in bootUR.
Throughout this section, we use the following notation. Consider time series for which one would like to test the presence of a unit root. We denote their respective individual unit root test statistics by . Typically these would correspond to one of the tests discussed in Section 2.2. Without loss of generality, we assume that rejections occur for small values of the test statistic.
Panel Unit Root Tests
Panel unit root tests view the multiple time series as a coherent panel dataset, and exploit the similarity between such time series to pool the information in them and achieve more powerful tests. They have a long tradition in econometrics, see e.g. Breitung and Pesaran (2008) or Choi (2015) for reviews. A typical panel unit root test has the null hypothesis that all series have a unit root. Rejection of this null hypothesis is then typically interpreted as evidence that a ‘significant proportion’ of the series is stationary. However, how large that proportion is, or which series are stationary is not revealed by the test. This makes panel unit root tests difficult to interpret, and limits their usefulness as pre-tests when determining the order of integration of each time series in a dataset. Nonetheless, the panel unit root null hypothesis may be interesting in its own right. Moreover, Pesaran (2012) suggests to use panel unit root tests as an initial screening tool for analyzing multiple series; if the panel unit root test rejects the null, this indicates that the individual series need to examined further; if not, treating the full dataset as may be a reasonable choice. For these reasons bootUR also includes some functionality to test the panel unit root hypothesis, although this is not our main focus given the interpretational difficulties.
Specifically, we implement the bootstrap Group-Mean () test of Palm et al. (2011)
in the function paneltest() which is based on averaging the unit root test statistics () of the individual time series. This test is valid under very general forms of dependence within the dataset, yet does not require modelling it. This in contrast to tests based on common factor models, which either require a complicated multi-step approach, or risk falsely eliminating the unit roots by eliminating the common factors, thereby risking false rejections (Bai and Ng, 2004, 2010). In contrast, the bootstrap test of Palm et al. (2011) is an easy, off-the-shelf method that fits bootUR’s philosophy. Panel unit roots tests are scarcely available for R users. Currently, only two packages with panel unit root tests, namely plm and pdR, are being maintained. The package plm was the first to offer panel unit root tests and provides the tests introduced in Maddala and Wu (1999); Choi (2001); Levin et al. (2002); Im et al. (2003). However, none of them allow for cross-sectional dependence (see Kleiber and Lupi, 2011 for a discussion). The package pdR offers the panel unit root test of Chang (2002) and the seasonal test of Hylleberg et al. (1990).
Given the ambiguity of a panel unit root test’s outcome, most practitioners will need to go one step further and determine the order of integration for each series in their dataset. The general setup is as follows. In order to properly rank and compare different series, the individual test statistics should have the same marginal distributions. Then, the ranking
corresponds to a ranking from ‘most significant’ to ‘least significant’, when the -th order statistic of is denoted by . To ensure the comparability of these statistics, nuisance parameters need to be eliminated from the distribution of the test statistics. bootUR does this automatically by scaling all test statistics as explained for the union test in (5), where the scaling takes for all time series.
The goal is to find an appropriate cut-off point such that the null of a unit root is rejected for all statistics less than or equal to , while it is not rejected for all statistics larger. How this threshold is determined, depends on how one controls for multiple testing. bootUR implements two ways to do this: the sequential testing procedure of Smeekes (2015), which also encompasses the Step-M method of Romano and Wolf (2005) to control the family-wise error rate (FWE), and the false discovery rate (FDR) controlling approach of Romano et al. (2008b); Moon and Perron (2012).
Sequential Quantile Test
proposes a straightforward and fast-to-implement Bootstrap Sequential Quantile Test (BSQT) for multiple unit root testing, that acts as an intermediate between panel unit root testing and full multiple testing control. The method proceeds by sequentially testing groups of time series for unit roots, where the user decides the group sizes. At step 1, we test whether the firstseries are stationary. Here ‘first’ does not refer to the order in the dataset (which is arbitrary), but to the most significant tests as found via (6). If the null hypothesis, that all units have a unit root, cannot be rejected, the test stops. If we do observe a rejection, we move on to the second group where we test if the first are stationary. However, as we already concluded that the first units are stationary, in this second step the actual test is whether the next units are stationary as well. We continue this testing procedure until no rejection is observed anymore or we tested all series in the dataset. The BSQT can be performed by using the function BQSTtest().
More formally, let be the number of series to be tested as stationary in each of the steps . In the sequential step we then test
As the first test should have as that all units are , by default. Furthermore, to complete the testing procedure. The number of steps and the intermediate numbers can be chosen by the practitioner. Instead of thinking in terms of series, it may be easier to think in terms of quantiles , and set . A practitioner may for instance think “I want to split my series in 10 equally-sized groups.” In that case the practitioner simply sets .
We acknowledge that the choice of does require input and consideration from the user, but unlike ‘obscure’ statistical arguments related to de-trending for instance, the choice for can be done simply based on the nature of the dataset and the desired level of precision of the practitioner. Smeekes (2015) shows that if units are found to be , the probability that the true number of stationary series lies outside the interval is at most the chosen significance level of the test. Finding that series are should therefore be interpreted as finding that the number of series is in the interval . In the end, if are chosen sensibly and not spaced too far apart, the series that lie in the ‘uncertain interval’ are likely those series which are ‘just about’ significant, and correspond to time series with a parameter very close to 1. The practical consequences of incorrect classification of these series are typically small, as their behavior makes them fit reasonably well in both classes of and series.
One special case worth mentioning – set as the default in BQSTtest() – is when we set , such that each series gets tested sequentially. Not only does this remove uncertainty about the interpretation of the result, but Smeekes (2015) also shows that in this case the BSQT method coincides with the popular Step-M method of Romano and Wolf (2005) to control the familywise error rate (FWE). The FWE is defined as the probability of making at least one false rejection, and is typically controlled via the Bonferroni or Holm (1979) approach. Romano and Wolf (2005) show that the Step-M method is considerably more powerful than the aforementioned approaches, as the bootstrap method it is based on can capture the true dependence between the series, and therefore does not have to be valid also in worst case scenarios. However, one should still realize that the FWE is very strict and overly conservative if is large, and this particular implementation of BSQT is mainly suitable for relatively small datasets.
The false discovery rate (FDR), originally proposed by Benjamini and Hochberg (1995), is defined as
where denote the total number of rejections, and the number of false rejections. It is more appropriate for larger than the FWE, as it aims to control the proportion of false rejections to the total, rather than the probability of a single false rejection. Romano et al. (2008b) develop a bootstrap method to control the FDR, and show that unlike the classical way to control FDR, the bootstrap is appropriate under very general forms of dependence between series. Moon and Perron (2012) applied this method to unit root testing, and it is their method that is implemented in the bFDRtest() function of the bootUR package.
A downside of this method, however, is the complicated and time-consuming nature of the algorithm, which to our knowledge is, likely for this reason, outside of bootUR, not available in R. The algorithm proceeds sequentially, in a step-down way, by starting to test the ‘most’ significant series (i.e. the one with the smallest unit root test statistic). This statistic is then compared to an appropriate bootstrap-based critical value, where the bootstrap evaluates all possible scenarios in terms of false and true rejections given the current stage of the algorithm. If the null can be rejected for the current series, the algorithm proceeds to the next ‘most’ significant series and the procedure is repeated. The algorithm stops as soon as the null cannot be rejected. Full details can be found in Romano et al. (2008a). While the algorithm is complicated to understand intuitively, the practitioner using the bFDRtest() function does not have to worry about this, as our fast C++ implementation does all the heavy lifting, such that this FDR-controlling test becomes a method like any other. As for the other multiple time series methods, FDR control can be combined with any unit root test specification considered in Section 2.2, although we recommend the default union test for the reasons described there.
To decide on whether to use BSQT or FDR control, relative sample sizes can be considered. The Monte Carlo comparison of Smeekes (2015) reveals that the FDR-controlling test is somewhat more accurate when the sample size is at least of equal magnitude as the number of time series , whereas the BSQT method is clearly preferable when is much smaller than , since the FDR-controlling test then suffers from a lack of power.
3 Bootstrap-based Inference
We rely on bootstrap methods to obtain critical values and/or -values for all of the unit root tests discussed in Section 2. In the bootUR package, six bootstrap methods are implemented: the sieve bootstrap (SB), moving block bootstrap (MBB), sieve wild bootstrap (SWB), dependent wild bootstrap (DWB), block wild bootsrap (BWB) and autoregressive wild bootstrap (AWB). Their properties are summarized in Table 2, and discussed more extensively below. As immediately apparent from Table 2, any ‘off-the-shelf’ time series bootstrap method may be used to counteract size distortions arising from neglected serial correlation (Schwert, 1989); whereas a wild bootstrap method is needed to deal with general forms of heteroskedasticity (Cavaliere and Taylor, 2008, 2009a). General forms of cross-sectional dependence can be captured by any bootstrap method apart from the sieve ones.
Next to correcting the size of unit root tests, bootstrap methods have other advantages. First, the bootstrap offers an automatic -value. This means no additional steps have to be taken to obtain -values, such as done in packages CADFtest or fUnitRoots for example. Second, the bootstrap directly allows for implementation of multiple testing techniques such as those discussed above. Moreover, as already mentioned, as the bootstrap captures the dependence between series, it allows for less conservative, and hence more powerful, tests than methods which use worst case scenarios to ensure validity. Second, it guards against misspecification and uncertainty regarding the lag length selection in the ADF. As bootUR re-selects the lag lengths within the bootstrap replications, it automatically takes effects of lag selection into account. This, coupled with the fact that the bootstrap captures any dependence missed by the lagged differences in the ADF regression, adds another layer of protection to the tests.
3.1 Sieve bootstrap
The sieve bootstrap (SB) has been extensively considered in the context of unit root testing; see among others Psaradakis (2001), Chang and Park (2003), Paparoditis and Politis (2005), Palm et al. (2008) and Smeekes (2013). It estimates the dependence as an autoregressive (AR) process, resamples the residuals of the AR fit, and then re-applies the AR model recursively to place the dependence back into the bootstrap sample. This simple and intuitive setup has made it historically popular among practitioners. bootUR determines the required order of the AR model by the order of the ADF model, combining these in a single step as they should conceptually coincide.
While it is able to capture general forms of serial dependence (Kreiss et al., 2011), it is mostly suited for tests on single time series. Smeekes and Urbain (2014b) show that it is not suited to capture general forms of cross-sectional dependence, making it invalid for joint or multiple testing. The bootUR package therefore advises to only use it for unit root testing of a single series or on multivariate series without multiple testing control, throwing a warning to alert the user otherwise. When still applied multivariately (against better judgment perhaps), users should also realize that each time series is required to be observed over the same periods, which we refer to as balanced datasets. This often forces practitioners to delete observations for series that have been observed for a longer period, a practice that is wasteful. The reason for this limitation is that resampling step of the sieve bootstrap would reshuffle the missing values, creating bootstrap sample with ‘holes’ in it.
3.2 Moving block bootstrap
The moving block bootstrap (MBB) is another traditional bootstrap method that has not only been used for univariate unit root testing in Paparoditis and Politis (2003), but also for multivariate unit root testing in Moon and Perron (2012) and Smeekes (2015), as well as for panel unit root testing in Palm et al. (2011). It works by dividing the data in overlapping blocks of data and resampling those blocks to create bootstrap series by laying them end-to-end. The blocks are taken in the time dimension and encompass all series. This way the MBB can accommodate any form of serial dependence as long as it ‘fits’ into an adequately sized block, which is a wide class. Unlike the SB, the MBB can also handle general forms of dependence between series, including but not limited to common factor structures. From a practical point of view one of the attractive features is that it can be applied without requiring one to model the serial and/or cross-sectional dependence. Palm et al. (2011) show its validity for mixed panel datasets under such general forms of dependence.
The block length is set automatically by bootUR as a function of the sample size, following a rule proposed by Palm et al. (2011) that they showed to perform well in many different circumstances. However, it is easily adjusted by the user to experiment with different lengths and assess the sensitivity of the results for varying block lengths.
The MBB still has, however, two disadvantages: it cannot handle unbalanced datasets and is sensitive to unconditional heteroskedasticity. The latter makes its use in various application domains, such as macro-economics or finance, problematic. To handle both issues, practitioners should switch to one of the wild bootstrap methods; which is recommended in bootUR.
3.3 Sieve wild bootstrap
is known to be robust against general forms of heteroskedasticity, however it cannot handle serial dependence. Nonetheless, if combined with a sieve bootstrap, we get the best of both worlds. That is, by replacing the resampling step applied to the residuals of the AR model with a multiplication by independent and identically distributed (iid) random variables with mean zero and variance one, we obtain the sieve wild bootstrap (SWB).Cavaliere and Taylor (2009a); Cavaliere and Taylor (2009b) and Smeekes and Taylor (2012) among others apply this sieve wild bootstrap for bootstrap unit root testing. The method is perfectly suited to individual unit root testing, but due the AR estimation, suffers from the same inability to capture complex dependence across series as explained by Smeekes and Urbain (2014b)
for the SB. Hence, the bootUR package warns against its use in multivariate settings. For the generation of the iid random variables, bootUR uses the normal distribution, which is the same choice as the unit root papers cited above.
3.4 Dependent, Block and Autoregressive wild bootstrap
The three remaining bootstrap methods implemented in the package are all wild bootstrap methods adjusted to deal with dependence. However unlike the SWB, here the multiplicative random variables themselves are adjusted to be dependent over time. This setup allows these bootstrap methods to capture complex serial and cross-series dependence structures as well as (unconditional) heteroskedasticity. In addition, no resampling takes place for the DWB, such that missing values ‘stay in their place’ which makes the method applicable to unbalanced datasets. These bootstrap methods therefore tick all the boxes in Table 2, making them very suitable for unit root testing.
The three wild bootstrap methods only differ in how the multiplier variables are made time-dependent. The dependent wild bootstrap method (DWB), originally introduced by Shao (2010), draws random variables from a -dimensional distribution, where the elements in decrease with the distance between them. Shao (2010) proposes to use a kernel function to achieve this, along with a bandwidth which ensures that variables more than time points apart are independent. This way has a similar interpretation as the block length in the MBB. Rho and Shao (2019) and Smeekes and Urbain (2014a) study the DWB for unit root testing, the latter focusing on multivariate settings.
We consider two more variations. The block wild bootstrap (BWB) (Shao, 2011; Zhang and Cheng, 2014) is a direct alternative to the MBB, where for each block of size , we use the same multiplier variable, and the variables are independent between blocks. The autoregressive wild bootstrap (AWB) (Smeekes and Urbain, 2014a; Friedrich et al., 2020) generates the multiplier variables as a first-order autoregressive process. Unlike the BWB and DWB who have a block length tuning parameter, the tuning parameter of the AWB is the first-order AR parameter. To be able to use the same tuning parameter , we use the conversion formula proposed by Smeekes and Urbain (2014a) and Friedrich et al. (2020) that writes the AR parameter as a function of , though bootUR also allows to set the AR parameter directly. The default setting for in bootUR uses the same rule as for the MBB, which was also tested for the three wild bootstrap methods by Smeekes and Urbain (2014a). They also provide theoretical results on the validity of these methods under general forms of dependence and heteroskedasticity.
For completeness, in Algorithm 1 we present the six bootstrap methods and their role in the general bootstrap algorithm. Note that the outcome of the bootstrap algorithm is a collection of bootstrap unit root test statistics for the series and bootstrap replications . How these are then used depends on the multiple testing approach taken. For instance, if we ignore multiple testing, we simply calculate the bootstrap -values
4 An introduction to the bootUR package
bootUR provides a library for all unit root tests discussed in Section 2, thereby relying on the bootstrap methods from Section 3 to obtain -values and/or critical values. The package has a simple structure with twelve user-accessible functions. Section 4.1 presents three functions to check if the data are suitable to be bootstrapped. Sections 4.2 and 4.3 introduce the six core functions for unit root testing on respectively individual and multiple time series. Section 4.4 presents three useful functions for determining the order of integration of each series in a particular dataset.
The package’s functions will now be presented together with examples of their specific use. To this end, we make use of the dataset MacroTS which contains a collection of 20 macroeconomic time series taken from Eurostat and comes with the package. A complete description of the data can be obtained by simply typing ?MacroTS in R. The following examples assume that both the required package bootUR and the data have been loaded: R> library(bootUR) R> data("MacroTS")
4.1 Checking data suitability
To check if a particular dataset is suitable to be bootstrapped, three simple functions can be used namely check_missing_insample_values(), find_nonmissing_subsample() and plot_missing_values(). While the bootstrap tests do not work with missing data, unbalanced datasets are allowed (for most bootstrap methods, see Table 2
). The function check_missing_insample_values() checks if a particular dataset contains missing values. Its usage is extremely simple, as it only requires the data as input, R> check_missing_insample_values(MacroTS) which can either be a vector, matrix, or in time series format (e.g. ts, zoo or xts). It returns an-dimensional vector which indicates for each series whether missing values are present (TRUE) or not (FALSE).
If a dataset contains series with different starting and end points, the bootstrap methods SWB, DWB, BWB and AWB can still be used. The function find_nonmissing_subsample() lets users check the start and end points of each series as follows: R> sample_check <- find_nonmissing_subsample(MacroTS) all_equal  FALSE
The output slot range returns a -matrix displaying the first and last non-missing value for each series, the logical slot all_equal provides a quick check to see if all time series have the same non-missing indices (TRUE) or not (FALSE).
Finally, to display missingness in the dataset, we can use R> plot_missing_values(MacroTS, show_names = TRUE) which displays present cell values in green, missing values at the start or end (‘Unbalanced NAs’) in purple and internal missing values in red (see Figure 1. Only the latter are problematic for the wild bootstrap methods, while the purple values also need to be avoided for the resampling-based bootstraps.
4.2 Individual Unit Root Tests
bootUR has two functions to perform a bootstrap unit root test on a single series: boot_df() for a standard ADF test and boot_union() for a union test. Below, we start by discussing the many options users can tweak in boot_df(). As bootUR shares its syntax across the various functions, the majority of function arguments remains identical across bootUR’s functions, which facilitates usability and control by the end-user. In the remainder, we therefore only highlight the differences compared to the boot_df() function.
To perform a standard ADF bootstrap unit root test on a single series, the boot_df() function can be used. The function is structured as follows: boot_df(y, level = 0.05, boot = "MBB", B = 1999, l = NULL, ar_AWB = NULL, p_min = 0, p_max = NULL, ic = "MAIC", dc = 1, detr = "OLS", ic_scale = TRUE, verbose = FALSE, show_progress = FALSE, do_parallel = FALSE, nc = NULL) The minimum required input is boot_df(y), where the time series y can be either a vector or a ts, zoo, xts object. All other arguments are set to sensible default values for reliable, accurate and generally applicable unit root testing. Yet, users are able to easily tweak all arguments to their desired settings.
The remaining arguments in the first line relate to the bootstrap specifications, including the desired significance level of the test (level), bootstrap method (boot) and number of bootstrap replications (B). If a user chooses the bootstrap method "MBB", "DBB" "BWB" or "AWB", the desired block length can be controlled via the argument l .
By default, we use
l = , as recommended in Palm et al. (2011).
While for the first three, this argument concerns the genuine block length, for the latter, the block length is transformed into an autoregressive parameter ar_AWB via the formula as in Smeekes and Urbain (2014a); this can be overwritten by setting ar_AWB directly.
The set of arguments on the second line relates to the ADF regression. The deterministic components can be tweaked via the argument dc, the type of de-trending via detr. The remaining arguments concern the lag length selection: p_min and p_max respectively control the minimum and maximum lag length, the information criterion can be selected via the argument ic and the option ic_scale lets practitioners choose to use the rescaled information criteria of Cavaliere et al. (2015). To overwrite data-driven lag selection with a pre-specified lag length, users can simply put both p_min and p_max equal to the desired lag length.
The arguments verbose and show_progress allow additional information to be printed: the option verbose = TRUE prints easy to read output on the unit root test to the console, the option show_progress = TRUE provides live progress updates on the bootstrap. The latter is particularly useful for large values of the argument B. Finally, the option do_parallel = TRUE ensures the bootstrap to be executed in parallel on systems where OpenMP is supported; the argument nc allows users to specify how many cores should be used for the parallel loops. By default, all but one cores are used. If the parallel option is selected on a system where OpenMP is not supported, evaluation will simply be serial.
We illustrate the bootstrap ADF test on Dutch GDP, with the sieve bootstrap (boot = "SB") as in the specification used by Palm et al. (2008) and Smeekes (2013). An intercept and linear time trend are added as deterministic components and de-trending is done via both OLS and QD. As random number generation is required to draw bootstrap samples, we first set the seeds of the random number generator to obtain reproducible results. R> GDP_NL <- MacroTS[, 4] R> set.seed(155776) R> adf_out <- boot_df(GDP_NL, boot = "SB", dc = 2, detr = c("OLS", "QD"), + verbose = TRUE, do_parallel = TRUE)
Since verbose = TRUE, the outcome of the unit root test (test statistic and -value) can be easily read from the console. Both tests indicate that the unit root null cannot be rejected: Bootstrap DF Test with SB bootstrap method. —————————————- Type of unit root test performed: detr = OLS, dc = intercept and trend test statistic p-value -2.5152854 0.1310655 —————————————- Type of unit root test performed: detr = QD, dc = intercept and trend test statistic p-value -1.5965001 0.4187094
Union of rejections test
To perform a bootstrap union unit root test on a single series, the boot_union() function can be used. It shares all its arguments with boot_df() except for dc and detr which are omitted since boot_union() implicitly uses dc = c(1,2) and detr = c("OLS", "QD"), then combines the outcomes of the four unit root tests, as in equation (5), to produce a single -value. We recommend its usage for quick or automatic unit root testing where careful manual specifications are not viable.
The bootstrap union test for Dutch GDP with the sieve wild bootstrap as proposed by Smeekes and Taylor (2012) can be obtained via R> union_out <- boot_union(GDP_NL, boot = "SWB", verbose = TRUE, + do_parallel = TRUE) Bootstrap Test with SWB bootstrap method. Bootstrap Union Test: The null hypothesis of a unit root is not rejected at a significance level of 0.05. test statistic p-value -0.6701345 0.6433217
4.3 Multiple Unit Root Tests
Below, we discuss the various approaches bootUR offers to approach the testing problem with multiple series.
Separate Unit Root Tests
To perform individual ADF tests on multiple time series simultaneously without multiple testing control, the function iADFtest() can be used: iADFtest(y, level = 0.05, boot = "MBB", B = 1999, l = NULL, ar_AWB = NULL, union = TRUE, p_min = 0, p_max = NULL, ic = "MAIC", dc = NULL, detr = NULL, ic_scale = TRUE, verbose = FALSE, show_progress = FALSE, do_parallel = FALSE, nc = NULL) Compared to the syntax of boot_df(), it has one additional argument, namely union which controls whether a bootstrap union test is used (TRUE) or not (FALSE). If union = TRUE (default), the arguments dc and detr are ignored, and a warning message is returned if the user would have provided specifications for these anyway. If set to FALSE, the deterministic components and de-trending methods can be specified as for the boot_df() function. Furthermore, since the bootstrap is performed for all series simultaneously, the bootstrap methods "SB" or "MBB", that cannot handle unbalanced datasets, should not be used. If the user were to specify these anyway, the function will revert to splitting the bootstrap up and performing it separately for each time series. A warning message is then returned to alert the user.
We illustrate the function’s usage by performing individual ADF tests with the
"MBB" bootstrap on the first five series of the unbalanced dataset MacroTS, which correspond to the real Gross Domestic Product in Belgium, Germany, France, the Netherlands and the United Kingdom respectively.
R> iADF_out <- iADFtest(MacroTS[, 1:5], boot = "MBB", verbose = TRUE,
+ do_parallel = TRUE)
There are 0 stationary time series.
test statistic p-value
GDP_BE -0.8135022 0.36618309
GDP_DE -1.1076021 0.08804402
GDP_FR -0.6301366 0.76188094
GDP_NL -0.8210610 0.41370685
GDP_UK -0.7207147 0.53876938
In check_inputs(y = y, BSQT_test = BSQT_test, iADF_test = iADF_test, :
Missing values cause resampling bootstrap to be executed for each time
None of the time series is stationary, as printed to the console together with detailed information on the value of the test statistic and -value for each time series (since
verbose = TRUE). The warning message alerts the user about the resampling
"MBB" bootstrap method being unable to handle unbalanced datasets and the corrective action that is taken to this end.
The user can easily access all information through the list with two components that is returned: R> iADF_out ADF_tests test statistic p-value GDP_BE -0.8135022 0.36618309 GDP_DE -1.1076021 0.08804402 GDP_FR -0.6301366 0.76188094 GDP_NL -0.8210610 0.41370685 GDP_UK -0.7207147 0.53876938
The slot rej_H0 contains a vector of length indicating for each series whether the unit root null is rejected (TRUE) or not (FALSE). The slot ADF_tests contains the values of the test statistics and -values. For the union test, the output is arranged per time series. If no union test is performed, the output is arranged per time series, type of deterministic component and de-trending method.
Panel Unit Root Test
To perform a panel unit root test, the function paneltest() can be used. It shares its syntax with iADFtest(). Unlike for the latter, usage of the "MBB" or "SB" bootstrap methods for a panel unit root test on unbalanced datasets will result in an error– not a warning –since the unbalancedness cannot be reverted. Therefore, users should switch to one of the wild bootstrap methods. Besides, sieve bootstrap methods can be used, but they are not suited to capture general forms of dependence across units (see Table 2). The code therefore warns users against their usage.
We illustrate the usage of the panel unit root test on the five GDP time series with the
"DWB" bootstrap of Shao (2010) and Rho and Shao (2019):
R> panel_out <- paneltest(MacroTS[, 1:5], boot = "DWB", verbose = TRUE,
+ do_parallel = TRUE)
Panel Bootstrap Group-Mean Union Test
The null hypothesis that all series have a unit root, is not
rejected at a significance level of 0.05.
test statistic p-value
[1,] -0.8371329 0.2956478
The outcome of the test is printed on the console (since
verbose = TRUE). Since the null is not rejected, treating all five GDP series as is reasonable.
Sequential Quantile Test
To perform the BSQT for multiple unit root testing, the function BSQTtest() should be used.
It has one additional argument compared to the paneltest() function, namely
q which sets the group sizes. These can either be set in units or in quantiles. To
split the series in, for instance, equally sized groups, use q = 0:K / K.
By the convention of Smeekes (2015), the first entry of the vector should be equal to zero, while the second entry indicates the end of the first group, and so on.
If the initial zero value or the final value ( or 1 for quantiles) are accidentally omitted, the function automatically adds them back.
The default q = 0:NCOL(y)
corresponds to the Step-M method of Romano and Wolf (2005).
Regarding the bootstrap methods, the same warning and error messaging as for the paneltest() apply.
We illustrate the BSQT on the five GDP series with the "AWB" (default) bootstrap method of Smeekes and Urbain (2014a) and Friedrich et al. (2020):
R> BSQT_out <- BSQTtest(MacroTS[, 1:5], verbose = TRUE, do_parallel = TRUE)
There are 0 stationary time series.
Details of the BSQT sequential tests:
Unit H0 Unit H1 Test statistic p-value
Step 1 0 1 -1.045657 0.3346673
The number of stationary time series is printed to the console (
verbose = TRUE), as well as details on the test-statistic and -value for each of the sequential steps until no rejection occurs. The latter information is also accessible through the output slot BSQT_sequence, details on the (non) rejection of the unit root null for each of the series separately can be accessed via the slot rej_H0, in a similar way as for the function iADFtest().
To perform a multiple unit root test by controlling the FDR, the function bFDRtest() should be used.
Its arguments are the same as for the other multivariate unit root tests, though the meaning of the argument level changes from the regular significance level to the FDR level.
We illustrate it here with the "BWB" bootstrap method of Shao (2011) and Smeekes and Urbain (2014a):
R> bFDR_out <- bFDRtest(MacroTS[, 1:5], boot = "BWB", verbose = TRUE,
+ do_parallel = TRUE)
There are 0 stationary time series
Details of the FDR sequential tests:
test statistic critical value
GDP_DE -0.9813749 -1.346138
Note that for the FDR-controlling test, critical values are returned instead of -values. All information can be accessed via the output slots
rej_H0 and FDR_sequence, which reports the test results until no rejection occurs.
4.4 Determining series’ order of integration
Finally, bootUR offers three useful functions for determining the order of integration of each series in dataset: order_integration(), diff_mult() and plot_order_integration().
The main function is order_integration() which applies the ‘Pantula principle’ (Pantula, 1989) to determine the order of integration of each series
order_integration(y, max_order = 2, test = "iADFtest", plot_orders = FALSE, …)
As minimum required input, the dataset should be provided:
order_integration(y). The argument
max_order sets the maximum order of integration that should be considered for each series.
While the default is two, we advise against setting a value larger than three (in which case the function would return an error).
Furthermore, the user can choose the unit root test through the argument
test depending on whether a single (
"boot_union") or a multiple time series (
"bFDRtest") is considered. To further tweak the corresponding functions, their arguments can be conveniently passed on.
The Pantula principle then works as follows. It starts by setting and testing for a unit root on the
series. The series for which the unit root null cannot be rejected are classified asand subsequently removed from the dataset. In the next step, and the remaining series are tested and classified accordingly. Under the default
max_order = 2, this second round involves testing the series in levels and classifying them as either (if the unit root null is not rejected) or (if the null is rejected).
The function returns a list with two elements. The first slot
a matrix whose columns are with indicating the order of integration of the series .
This matrix is generated by the user-accessible function diff_mult(y, d), where is the original dataset and is an -dimensional vector indicating each series’ order of integration.
It contains the same number of rows as the original dataset (since the default setting keep_NAs = TRUE in diff_mult() is used), thereby indicating lost observations as missing. It can we tweaked if a practitioner wants to make direct use of this function.
The second output slot
order_int makes this vector
d explicitly available to the end-user.
Finally, if the argument
plot_orders in the function order_integration() is set to
TRUE, a plot is provided which displays each series’ order of integration. To this end, it uses the function plot_order_integration(d) with minimal required input being the same vector
d. This function is also made accessible if the end-user wishes to further adjust the display of the variable names, legend and colours through its optional arguments show_names, show_legend, names_size, legend_size and cols.
We illustrate the methods on two datasets, the
MacroTS dataset, which comes with the package, and the FRED-QD dataset, which is widely used for macro-economic analysis.
MacroTS dataset contains macro-economic time series collected from Eurostat (https://ec.europa.eu/eurostat/data/database) and is included in the package. Quarterly observations from 1992-2019 () are available on GDP, consumption, inflation and unemployment for Belgium, Germany, France, the Netherlands and the United Kingdom. The dataset is unbalanced, see Figure 1.
This is a quarterly version of the monthly Federal Reserve Economic Data database introduced in McCracken and Ng (2016). It contains macro-economic time series and was imported into R using the commands
R> fred.md.url <- url("https://s3.amazonaws.com/files.fred.stlouisfed.org/
R> FRED_MD <- read.csv(FRED_url)
This paper uses the data from 1959 Quarter 2 to 2019 Quarter 4 () to avoid possible structural breaks due to the COVID-19 pandemic in 2020.
If a researcher wishes to import the up-to-date version of the dataset, 2020-06.csv should be changed to current.csv.
As can be seen from Figure 2, the dataset contains one internal NA, since the third observation of variable 188 (
UMCSENTx: Consumer Expectations) is missing while the second observation is not. bootUR cannot handle internal missing values but this can be easily fixed by setting the second observation to
NA, which results in the first three observations of this variable being ‘unbalanced NAs’ that can be handled by bootUR.
The resulting dataset then contains 38 macro-economic indicators with missing values at the start of the sample.
Finally, note that all FRED-QD series have been classified into by the transformation codes provided in McCracken and Ng (2020). However, the authors themselves indicate several discrepancies between these codes and the outcome of unit root tests. We therefore use the transformation codes as a benchmark for the classifications obtained through the unit root tests but do not necessarily consider the classification closest to theirs to be the best.
Since some of the macro-economic series are likely to be , we use the order_integration() function (with its defaults) to implement the Pantula principle.
All unit root tests in the bootUR are performed with their default settings, which means that union tests are performed with the
AWB bootstrap method, and lag length selection is done via the re-scaled MAIC. Throughout this section, a significance level of 5% is used.
For BSQTtest(), the default (i.e. Step-M method) is reported as well as results for evenly spaced 0.1 quantiles (
q = 0:10/10, for
MacroTS, FRED-QD), and 0.05 quantiles (
q = 0:20/20, for FRED-QD).
We compare bootUR’s unit root tests to the R packages reported in Table 1. We hereby use the following specifications:
For the function CADFtest() (package CADFtest), we perform ADF-regressions with intercept and trend (
type = "trend"), and lag length selection with MAIC (
criterion = "MAIC") thereby considering a maximum of lags, set via the argument
max.lag.y. These lag length specifications correspond to the defaults used in bootUR. From the existing unit root packages, we find CADFtest to be the one with the most appealing API which also allows full user control of important model specifications and provides easy to read off results. As such we consider this package to be our main reference point, but we also include results from the other packages to evaluate the sensitivity of the test outcomes on the chosen package.
For unitrootTest() (package fUnitRoots), we perform ADF-regressions with intercept and trend (
type = "ct"). By default, one lagged difference is included.
For adf.test() (package tseries), we use its default settings which implies ADF-regressions with intercept and trend and the number of lags fixed to , a deterministic function of the sample size.
For ur.df() (package urca), we use ADF-regressions with intercept and trend (
type="trend"), lag length selection via AIC (
selectlags = "AIC"), thereby considering a maximum of lags, set via the argument
Finally, for ur.ers() (package urca), we use an intercept and trend for de-trending (
model = "trend"). By default, four lagged differences are included in the ADF-regression.
Unlike the other packages, urca only comes with critical values to judge the significance of the unit root test, the -value is not reported, see Table 1. As discussed in Lupi (2009), the -value reported under summary() is computed using the -distribution, which is incorrect under the unit root null.
Finally, only the packages CADFtest and fUnitRoots can handle missing values, for the other packages, we removed missing values prior to performing the unit root tests.
Before applying the various unit root tests to the two datasets, we perform the paneltest() (with default settings) to all series taken in first differences, and to all series in levels. Table 3 reports the -values of the panel unit root tests. For both datasets, the panel unit root tests on the series in first differences indicates that the unit root null is rejected, thereby indicating that a ‘significant proportion’ of the series is stationary in first differences (hence not ). The panel unit root test on the series in levels indicates non-rejection of the unit root null.
|in first differences|
To shed further light on the order of integration for each of the individual series, the bootstrap unit root tests are applied and compared to the implementations from other R packages. Figure 3 presents the obtained orders of integration on the
MacroTS dataset, Figures 4 and 5 on the FRED-QD dataset.
Globally speaking, most unit root tests agree upon a series’ classification into , which is comforting. Still, several interesting remarks can be made.
First, the results of
iADFtest are fairly similar to
bFDRtest but it classifies a considerable amount of series as instead of on the FRED-QD dataset. This illustrates that ignoring multiple testing can quickly lead to a considerable number of misclassifications on such large datasets.
Second, among the
BSQTtest procedures, the default Step-M method tends to classify more series as than the other two procedures. As discussed in Section 2.3, we only recommend its usage for small datasets. On the smaller
MacroTS dataset, for instance, the two versions of the
BSQTtest show more agreement than on the larger FRED-QD dataset.
bFDRtest tends to classify more series as than the other tests.
For a more elaborate discussion of this tendency, we refer the interested reader to Smeekes and Wijler (2020).
Fourth, among the unit root tests from the other R packages, CADFtest() produces most similar results to bootUR. The function unitrootTest() detects far less series as .
While different implementation of these unit root test do produce different results and it thus matters which test is used in practice, we do find that the unit root tests in the bootUR package tend to produce more stable results with respect to the series’ order of integration.
This paper presents the R package bootUR that provides a unified framework for bootstrap unit root testing on single and multiple time series. To this end, the package builds upon the popular augmented Dickey-Fuller (ADF) test. Differently from already available packages on unit root tests, bootUR (i) provides a large collection of easy-to-use, fully-controllable and reliable unit root tests, including the union of rejections test which is set as default to enable quick, automatic unit root testing, (ii) ensures accurate inference through bootstrap methods with easy-to-read output (including -values), (iii) allows for testing the presence of unit roots in datasets containing many time series by relying on fast C++ implementations.
The first author was financially supported by the Netherlands Organization for Scientific Research (NWO) under grant number 452-17-010, the second author by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 832671. We gratefully acknowledge the comments and checks provided by Robert Adamek, Rui Jorge Almeida, Nalan Baştürk, Caterina Schiavoni and Etiënne Wijler on earlier versions of the package. All remaining errors are our own.
- Bai and Ng (2004) Bai J, Ng S (2004). “A PANIC attack on unit roots and cointegration.” Econometrica, 72(4), 1127–1177.
- Bai and Ng (2010) Bai J, Ng S (2010). “Panel unit root tests with cross-section dependence: A further investigation.” Econometric Theory, 26(4), 1088–1114.
- Benjamini and Hochberg (1995) Benjamini Y, Hochberg Y (1995). “Controlling the false discovery rate: a practical and powerful approach to multiple testing.” Journal of the Royal Statistical Society: Series B, 57(1), 289–300.
- Breitung and Pesaran (2008) Breitung J, Pesaran MH (2008). “Unit roots and cointegration in panels.” In The Econometrics of Panel Data, pp. 279–322. Springer.
- Bronder (2016) Bronder S (2016). PANICr: PANIC Tests of Nonstationarity. R package version 1.0.0, URL https://CRAN.R-project.org/package=PANICr.
- Campbell and Perron (1991) Campbell JY, Perron P (1991). “Pitfalls and opportunities: What macroeconomists should know about unit roots.” NBER Macroeconomics Annual, 6, 141–201.
- Cavaliere (2005) Cavaliere G (2005). “Unit root tests under time-varying variances.” Econometric Reviews, 23(3), 259–292.
- Cavaliere et al. (2015) Cavaliere G, Phillips PCB, Smeekes S, Taylor AMR (2015). “Lag length selection for unit root tests in the presence of nonstationary volatility.” Econometric Reviews, 34(4), 512–536.
- Cavaliere and Taylor (2008) Cavaliere G, Taylor AMR (2008). “Bootstrap unit root tests for time series with nonstationary volatility.” Econometric Theory, 24(1), 43–71.
- Cavaliere and Taylor (2009a) Cavaliere G, Taylor AMR (2009a). “Bootstrap unit root tests.” Econometric Reviews, 28(5), 393–421.
- Cavaliere and Taylor (2009b) Cavaliere G, Taylor AR (2009b). “Heteroskedastic time series with a unit root.” Econometric Theory, pp. 1228–1276.
- Chang (2002) Chang Y (2002). “Nonlinear IV unit root tests in panels with cross-sectional dependency.” Journal of Econometrics, 110(2), 261–292.
- Chang and Park (2003) Chang Y, Park JY (2003). “A sieve bootstrap for the test of a unit root.” Journal of Time Series Analysis, 24(4), 379–400.
- Choi (2001) Choi I (2001). “Unit root tests for panel data.” Journal of International Money and Finance, 20(2), 249–272.
- Choi (2015) Choi I (2015). Almost All About Unit Roots: Foundations, Developments, and Applications. Cambridge University Press.
- Croissant and Millo (2008) Croissant Y, Millo G (2008). “Panel Data Econometrics in R: The plm Package.” Journal of Statistical Software, 27(2), 1–43. doi:10.18637/jss.v027.i02.
- Dagum and Menon (1998) Dagum L, Menon R (1998). “OpenMP: an industry standard API for shared-memory programming.” Computational Science & Engineering, IEEE, 5(1), 46–55.
- Davidson and Flachaire (2008) Davidson R, Flachaire E (2008). “The wild bootstrap, tamed at last.” Journal of Econometrics, 146, 162–169.
- Dickey and Fuller (1979) Dickey DA, Fuller WA (1979). “Distribution of estimators for autoregressive time series with a unit root.” Journal of the American Statistical Association, 74(366a), 427–431.
- Dickey and Fuller (1981) Dickey DA, Fuller WA (1981). “Likelihood ratio statistics for autoregressive time series with a unit root.” Econometrica, pp. 1057–1072.
- Eddelbuettel (2013) Eddelbuettel D (2013). Seamless R and C++ Integration with Rcpp. Springer, New York. doi:10.1007/978-1-4614-6868-4. ISBN 978-1-4614-6867-7.
- Eddelbuettel and Balamuta (2017) Eddelbuettel D, Balamuta JJ (2017). “Extending R with C++: A Brief Introduction to Rcpp.” PeerJ Preprints, 5, e3188v1. ISSN 2167-9843. doi:10.7287/peerj.preprints.3188v1. URL https://doi.org/10.7287/peerj.preprints.3188v1.
- Eddelbuettel and François (2011) Eddelbuettel D, François R (2011). “Rcpp: Seamless R and C++ integration.” Journal of Statistical Software, 40(8), 1–18.
- Eddelbuettel and Sanderson (2014) Eddelbuettel D, Sanderson C (2014). “RcppArmadillo: Accelerating R with high-performance C++ linear algebra.” Computational Statistics & Data Analysis, 71, 1054–1063.
- Elliott et al. (1996) Elliott G, Rothenberg TJ, Stock JH (1996). “Efficient tests for an autoregressive unit root.” Econometrica, 64(4), 813–836.
- Enders (2008) Enders W (2008). Applied Econometric Time Series. 4th edition. John Wiley & Sons.
- Friedrich et al. (2020) Friedrich M, Smeekes S, Urbain JP (2020). “Autoregressive wild bootstrap inference for nonparametric trends.” Journal of Econometrics, 214(1), 81–109.
- Hansen (1995) Hansen BE (1995). “Rethinking the univariate approach to unit root testing: Using covariates to increase power.” Econometric Theory, pp. 1148–1171.
- Harvey et al. (2009) Harvey DI, Leybourne SJ, Taylor AMR (2009). “Unit root testing in practice: dealing with uncertainty over the trend and initial condition.” Econometric Theory, 25(3), 587–636.
- Harvey et al. (2012) Harvey DI, Leybourne SJ, Taylor AMR (2012). “Testing for unit roots in the presence of uncertainty over both the trend and initial condition.” Journal of Econometrics, 169(2), 188–195.
- Holm (1979) Holm S (1979). “A simple sequentially rejective multiple test procedure.” Scandinavian Journal of Statistics, 6, 65–70.
- Hylleberg et al. (1990) Hylleberg S, Engle RF, Granger CW, Yoo BS (1990). “Seasonal integration and cointegration.” Journal of Econometrics, 44(1-2), 215–238.
- Im et al. (2003) Im KS, Pesaran MH, Shin Y (2003). “Testing for unit roots in heterogeneous panels.” Journal of Econometrics, 115(1), 53–74.
- Kleiber and Lupi (2011) Kleiber C, Lupi C (2011). “Panel unit root testing with R.” URL https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/inst/doc/panelUnitRootWithR.pdf.
- Kleiber and Lupi (2012) Kleiber C, Lupi C (2012). punitroots: Tests for Unit Roots in Panels of (Economic) Time Series, With and Without Cross-sectional Dependence. R package version 0.0-2, URL https://r-forge.r-project.org/projects/punitroots/.
- Kreiss et al. (2011) Kreiss JP, Paparoditis E, Politis DN (2011). “On the range of validity of the autoregressive sieve bootstrap.” Annals of Statistics, 39, 2103–2130.
- Kwiatkowski et al. (1992) Kwiatkowski D, Phillips PC, Schmidt P, Shin Y (1992). “Testing the null hypothesis of stationarity against the alternative of a unit root.” Journal of Econometrics, 54(1-3), 159–178.
- Levin et al. (2002) Levin A, Lin CF, Chu CSJ (2002). “Unit root tests in panel data: asymptotic and finite-sample properties.” Journal of Econometrics, 108(1), 1–24.
- López-de Lacalle and Boshnakov (2019) López-de Lacalle J, Boshnakov GN (2019). uroot: Unit Root Tests for Seasonal Time Series. R package version 2.1-0, URL https://CRAN.R-project.org/package=uroot.
- Lupi (2009) Lupi C (2009). “Unit root CADF testing with R.” Journal of Statistical Software, 32(2), 1–19.
- MacKinnon et al. (1999) MacKinnon JG, Haug AA, Michelis L (1999). “Numerical distribution functions of likelihood ratio tests for cointegration.” Journal of Applied Econometrics, 14(5), 563–577.
- Maddala and Wu (1999) Maddala GS, Wu S (1999). “A comparative study of unit root tests with panel data and a new simple test.” Oxford Bulletin of Economics and Statistics, 61(S1), 631–652.
- Mallet (2017) Mallet O (2017). URT: Fast Unit Root Tests and OLS regression in C++ with wrappers for R and Python. URL https://github.com/olmallet81/URT.
- Mammen (1993) Mammen E (1993). “Bootstrap and wild bootstrap for high dimensional linear models.” Annals of Statistics, 21, 255–285.
- McCracken and Ng (2020) McCracken M, Ng S (2020). “FRED-QD: A quarterly database for macroeconomic research.” Working Paper 26872, National Bureau of Economic Research.
- McCracken and Ng (2016) McCracken MW, Ng S (2016). “FRED-MD: A monthly database for macroeconomic research.” Journal of Business & Economic Statistics, 34(4), 574–589.
- Moon and Perron (2012) Moon HR, Perron B (2012). “Beyond panel unit root tests: Using multiple testing to determine the non stationarity properties of individual series in a panel.” Journal of Econometrics, 169(1), 29–33.
- Müller and Elliott (2003) Müller UK, Elliott G (2003). “Tests for unit roots and the initial condition.” Econometrica, 71(4), 1269–1286.
- Nelson and Plosser (1982) Nelson CR, Plosser CR (1982). “Trends and random walks in macroeconmic time series: Some evidence and implications.” Journal of Monetary Economics, 10(2), 139 – 162.
- Ng and Perron (2001) Ng S, Perron P (2001). “Lag length selection and the construction of unit root tests with good size and power.” Econometrica, 69(6), 1519–1554.
- Ng and Perron (2005) Ng S, Perron P (2005). “A note on the selection of time series models.” Oxford Bulletin of Economics and Statistics, 67, 115–134.
- Palm et al. (2008) Palm FC, Smeekes S, Urbain JP (2008). “Bootstrap unit root tests: comparison and extensions.” Journal of Time Series Analysis, 29(1), 371–401.
- Palm et al. (2011) Palm FC, Smeekes S, Urbain JP (2011). “Cross-sectional dependence robust block bootstrap panel unit root tests.” Journal of Econometrics, 163(1), 85–104.
- Pantula (1989) Pantula SG (1989). “Testing for unit roots in time series data.” Econometric Theory, 5(2), 256–271.
- Paparoditis and Politis (2003) Paparoditis E, Politis DN (2003). “Residual-based block bootstrap for unit root testing.” Econometrica, 71(3), 813–855.
- Paparoditis and Politis (2005) Paparoditis E, Politis DN (2005). “Bootstrapping unit root tests for autoregressive time series.” Journal of the American Statistical Association, 100, 545–553.
- Paparoditis and Politis (2018) Paparoditis E, Politis DN (2018). “The asymptotic size and power of the augmented Dickey–Fuller test for a unit root.” Econometric Reviews, 37(9), 955–973.
- Perron and Qu (2007) Perron P, Qu Z (2007). “A simple modification to improve the finite sample properties of Ng and Perron’s unit root tests.” Economics Letters, 94(1), 12–19.
- Pesaran (2012) Pesaran MH (2012). “On the interpretation of panel unit root tests.” Economics Letters, 116(3), 545–546.
- Pfaff (2008) Pfaff B (2008). Analysis of Integrated and Cointegrated Time Series with R. Second edition. Springer, New York. ISBN 0-387-27960-1, URL http://www.pfaffikus.de.
- Phillips and Perron (1988) Phillips PC, Perron P (1988). “Testing for a unit root in time series regression.” Biometrika, 75(2), 335–346.
- Psaradakis (2001) Psaradakis Z (2001). “Bootstrap tests for an autoregressive unit root in the presence of weakly dependent errors.” Journal of Time Series Analysis, 22, 577–594.
- R Core Team (2017) R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
- Rho and Shao (2019) Rho Y, Shao X (2019). “Bootstrap-assisted unit root testing With piecewise locally stationary errors.” Econometric Theory, 35(1), 142–166.
- Romano et al. (2008a) Romano JP, Shaikh AM, Wolf M (2008a). “Control of the false discovery rate under dependence using the bootstrap and subsampling.” Test, 17(3), 417–442.
- Romano et al. (2008b) Romano JP, Shaikh AM, Wolf M (2008b). “Formalized data snooping based on generalized error rates.” Econometric Theory, 24(2), 404–447.
- Romano and Wolf (2005) Romano JP, Wolf M (2005). “Stepwise multiple testing as formalized data snooping.” Econometrica, 73(4), 1237–1282.
- Schwert (1989) Schwert GW (1989). “Tests for unit roots: a Monte Carlo investigation.” Journal of Business and Economic Statistics, 7(1), 147–159.
- Shao (2010) Shao X (2010). “The dependent wild bootstrap.” Journal of the American Statistical Association, 105(489), 218–235.
Shao X (2011).
“A bootstrap-assisted spectral test of white noise under unknown dependence.”Journal of Econometrics, 162(2), 213–224.
- Smeekes (2013) Smeekes S (2013). “Detrending bootstrap unit root tests.” Econometric Reviews, 32(8), 869–891.
- Smeekes (2015) Smeekes S (2015). “Bootstrap sequential tests to determine the order of integration of individual units in a time series panel.” Journal of Time Series Analysis, 36(3), 398–415.
- Smeekes and Taylor (2012) Smeekes S, Taylor AMR (2012). “Bootstrap union tests for unit roots in the presence of nonstationary volatility.” Econometric Theory, 28(2), 422–456.
- Smeekes and Urbain (2014a) Smeekes S, Urbain JP (2014a). “A multivariate invariance principle for modified wild bootstrap methods with an application to unit root testing.” GSBE Research Memorandum RM/14/008, Maastricht University.
- Smeekes and Urbain (2014b) Smeekes S, Urbain JP (2014b). “On the applicability of the sieve bootstrap in time series panels.” Oxford Bulletin of Economics and Statistics, 76(1), 139–151.
- Smeekes and Wijler (2020) Smeekes S, Wijler E (2020). “Unit roots and cointegration.” In P Fuleky (ed.), Macroeconomic Forecasting in the Era of Big Data, volume 52 of Advanced Studies in Theoretical and Applied Econometrics, chapter 17, pp. –541–584. Springer.
- Smeekes and Wilms (2020) Smeekes S, Wilms I (2020). bootUR: Bootstrap Unit Root Tests. R package version 0.1.0, URL https://CRAN.R-project.org/package=bootUR.
- Trapletti et al. (2019) Trapletti A, Hornik K, LeBaron B (2019). tseries: Time Series Analysis and Computational Finance. R package version 0.10-47, URL https://CRAN.R-project.org/package=tseries.
- Tsung-wu (2019) Tsung-wu H (2019). pdR: Threshold Model and Unit Root Tests in Cross-Section and Time Series Data. R package version 1.7, URL https://CRAN.R-project.org/package=pdR.
- Wuertz et al. (2017) Wuertz D, Setz T, Chalabi Y (2017). fUnitRoots: Rmetrics - Modelling Trends and Unit Roots. R package version 3042.79, URL https://CRAN.R-project.org/package=fUnitRoots.
- Zhang and Cheng (2014) Zhang X, Cheng G (2014). “Bootstrapping high dimensional time series.” ArXiv e-print 1406.1037.
- Zhang et al. (2011) Zhang Y, Yu H, McLeod AI (2011). “Maximum likelihood unit root test.” Working Paper.