1 Introduction
The notion of stationarity plays a very important role in statistical modeling. In its weakest form of first or second order stationarity it implies that the mean or second moment, respectively, are time invariant; see for instance
Xiao and Lima (2007), Dwivedi and Subba Rao (2011), and Jentsch and Subba Rao (2015). A more general related notion is that of order (weak) stationarity which requires that all joint product moments of order (up to) are time invariant. Most studies of stationarity are restricted to some form of weak stationarity, which of course is the most suitable concept for linear time series. On the other hand, the property of strict stationarity states that not only moments, but the entire probabilistic structure of a given series is time invariant. This property is of great relevance with nonlinear time series in which low order moments are not sufficient for the dynamics of the series, not to mention the case of heavytailed time series lacking higher moments (Loretan and Phillips, 1994; Andrews et al., 2009). Another divide is between parametric and nonparametric tests for stationarity, with the first class containing the majority of procedures in earlier literature. There is also the methodological approach that categorizes methods which operate either in the time or in the frequency domain. As the existing literature is vast, we provide below only a selected set of references which is by no means exhaustive.In econometrics the majority of earlier tests for weak stationarity are either tests for stationarity against unit root, or alternatively, tests of a unit root against stationarity, with the KPSS and the DickeyFuller tests being the by far most popular ones and having enjoyed a number of generalizations; see for instance GilAlana (2003), Giraitis et al. (2003), Müller (2005), and Horváth et al. (2014). In fact these tests, although developed with a particular alternative in mind, they have often been employed more generally to explore structural change. When it comes to testing for strict stationarity in parametric time series there exist many tests. These tests typically reduce to testing for a given portion of the parameter space and may often readily be extended to monitoring procedures. We indicatively mention here the works for ARMA, GARCH, and DAR models by Bai (1994), Francq and Zakoïan (2012) and Guo et al. (2019), respectively. On the other hand testing methods for strict stationarity are scarce when performed within a purely nonparametric framework. It appears that Kapetanios (2009)
was the first to address the issue of testing strict stationarity of the marginal distribution of an arbitrary time series. His work employs smoothing techniques in order to estimate the density and a bootstrap implementation of the procedure. There is also the method of
Hong et al. (2017) which is based on the joint characteristic function and the papers of Busetti and Harvey (2010) and Lima and Neri (2013)which test for constancy (of a discretized version) of the marginal quantile process. The interest in testing for stationarity rests on the fact that modelling, predictions and other inferential procedures are invalid if this assumption is violated. However, although strict stationarity is widely assumed in the literature, it is not truly a realistic assumption when one observes a given time series over a long period of time. On the contrary it is expected that institutional changes cause structural breaks in the stochastic properties of certain variables, particularly in the macroeconomic and financial world. In this connection, monitoring the stationarity of a stochastic process seems to be of an even greater importance than testing. In this paper we propose a sequential procedure for strict stationarity. Our approach uses the characteristic function (CF) as the main tool. The advantage of using this function is that the CF can be estimated purely nonparametrically without the use of smoothing techniques. Moreover, and unlike the case of estimating the joint distribution function, the estimate of the joint CF is easy to obtain and when viewed as a process it is continuous in its argument. These features offer specific theoretical and computational simplifications which are particularly important in the multivariate context, and are clearly reflected in the competitiveness of the resulting methods over classical ones; see for instance
Pinske (1998), Su and White (2007), Hlávka et al. (2012), and Matteson and James (2014). The asymptotics of the detector statistic involves some weak convergence theorems of stochastic integrals to achieve its limiting distribution. Here, we establish the weak convergence theorem in a more general framework for potential usage in various online monitoring procedures and retrospective change point tests. For more details on the monitoring procedure for the autocovariance function of a linear process, we refer to Na et al. (2011).The remainder of the paper is as follows. In Section 2 we introduce the basic idea behind the proposed procedures. Section 3 presents the corresponding detector statistics and in Section 4 we study the large sample behavior of the new methods. As the limit null distribution of our detector is complicated, we propose in Section 5 a resampling procedure in order to actually carry out the suggested monitoring method. The results of a Monte Carlo study for the finitesample properties of the methods are presented in Section 6. Some realworld empirical applications to financial market data are presented and discussed in Section 7. Finally, we end in Section 8 with conclusions and discussion. The proofs of the lemmas and theorems in Section 4 are all provided in the Appendix.
2 ECF statistics
Let be an arbitrary time series, and write for the corresponding distribution function (DF) of the dimensional variable , . We are interested in the behavior of the distribution of over time, i.e. to the monitoring the joint distribution of the observations of a given dimension
. The null hypothesis is stated as,
(1) 
against the alternative
(2) 
for some , where , , as well as the threshold are considered unknown. Clearly corresponds to monitoring the marginal distribution of , corresponds to the joint bivariate distribution of , and so on. As it is typical in monitoring studies we assume that there exist a set of observations (often termed training data) which involve no change, and monitoring starts after time .
To motivate our procedure let
, be the characteristic function (CF) of an arbitrary random vector
. We will compare a nonparametric estimator of the joint CF , of , with the the same estimator obtained from observations beyond time . Then, the quantity of interest is(3) 
where
(4) 
is the empirical characteristic function (ECF) computed from the observations , and is a weight function which will be discussed below.
Our motivation for considering (3) as our main tool is that the null hypothesis (1) may equivalently be stated as
(5) 
and therefore we expect to be ‘small’ under the null hypothesis (5). Moreover, and unlike equivalent approaches based on the empirical DF, the ECF approach enjoys the important feature of computational simplicity. Specifically, by straightforward algebra we have from (3),
(6) 
where with
(7) 
and a large (positive) value of the test criterion indicates violation of the null hypothesis. A standard choice is to set , which leads to
(8) 
and hence renders our statistic in closedform.
Another interesting choice results by considering the statistic (in which case of course large negative values of are significant). Then we may write
(9) 
where with
(10) 
If we let in (10) , then
where is a known constant depending only on and , and hence in this case too our statistic comes in a closedform expression suitable for computer implementation. Note that this weight function was first used by Székely and Rizzo (2005), and later employed by Matteson and James (2014) in changepoint analysis.
The choice for the weight function is usually based upon computational considerations. In fact if
integrates to one (probably following some scaling) and satisfies
then the integral figuring in (7) can be interpreted as the CF of a symmetric around zero random variable having density
. In this connection can be chosen as the density of any such distribution. Hence the choice corresponds to the multivariate normal density but for computational purposes any density with a simple CF will do. Other examples for are, for instance, the multivariate spherical stable density, the Laplace density, mixtures of normals or Laplace distributions, and combinations thereof. In fact, one might wonder whether there is a weight function which is optimal in some sense. The issue is still open but based on earlier finitesample results it appears that the issue of the choice ofis similar to the corresponding problem of choosing a kernel and a bandwidth in nonparametric estimation: Most weight functions (kernels) render similar behavior of the test statistic, but there is some sensitivity with respect to the “bandwidth” parameter
figuring in (8). This is a highly technical problem that has been tackled only under the restrictive scenario of testing goodnessoffit for a given parametric distribution, and even then a good choice of depends on the direction away from the null hypothesis; see Tenreiro (2009). Thus in our context the approach to the weight function is in some sense pragmatic: We use the Gaussian weight function which has become something of a standard, and investigate the behavior of the criterion over a grid of values of the weight parameter . However, the alternative weight function below (10) has also been tried, giving similar results.Another important userspecified parameter of our procedure is the order that determines the dimension of the joint distribution which is monitored for stationarity. Of course having a sample size of training data already imposes the obvious restriction , but if is only slightly smaller than , then we do not expect the ECF to be a reliable estimator of its population counterpart. The situation is similar to the problem of orderchoice when estimating correlations from available data; see Brockwell and Davis (1991), p. 221, and Box and Jenkins (1970), p. 33, for some general guidelines. In our Monte Carlo results we only consider cases where (very small compared to ), but it is reasonable to assume that we could let the dimension grow as increases; refer to Section 6.
We close this section by noting that the statistic in (3) compares the ECF computed from data up to time , i.e. the ECF of the training data, with the ECF of all data available at the current monitoring time , i.e. the data made available both before and after time . Another option is to consider the statistic
(11) 
where
(12) 
in which case we compare the ECF of the training data to the ECF computed from the observations after time , i.e. after monitoring has begun. Below we will often write as our statistic, but the methods also apply to the case of the other statistics considered in this section. We finally close the section by noting that throughout we have assumed that is univariate. However extension to the multivariate context does not seem to present us with any undue complications.
3 Detector statistics and stopping rule
As already mentioned we consider online procedures whereby the test is applied sequentially on a dynamic data set which is steadily updated over time with the arrival of new observations. In this context, the null hypothesis is rejected when the value of a suitable detector statistic exceeds an appropriately chosen constant for the first time
. Otherwise we continue monitoring. These statistics are commonly defined by a corresponding stopping rule. In order to define this stopping rule, and based on asymptotic considerations, we need to introduce an extra weight function in order to control the largesample probability of typeI error. In particular we employ the detector statistic
(13) 
where is defined by (6) and,
(14) 
Here is an extra weight function needed to control (asymptotically) the probability of typeI error for the sequential test procedure. The parameter figuring in (14) gives some flexibility to the resulting procedure. Specifically, if early violations are expected then the value of should be close to 1/2, while values closer to zero are appropriate for detecting changes occurring at later stages; see Aue et al. (2006).
As already mentioned, it is clear that since the training data are assumed to involve no change, the monitoring period begins with time . Typically this monitoring continues until time , where denotes a fixed integer, and if we call the corresponding procedure closedend. Otherwise (i.e. if ), we have an openend procedure. The corresponding stopping rule is specified as
(15) 
where is a constant that guarantees that the test has size equal to , asymptotically.
The main problem is then to find an approximation for the critical value and to investigate consistency of the test procedures. Particularly, we require that under and for a prechosen value of ,
(16) 
while under alternatives we want
(17) 
We close this section by noting that along the same lines one may suggest retrospective tests, i.e. tests in which all data have already arrived at the time of testing, and one is interested to see whether a structural change has occured over the time period leading to that time. To this end let , denote not the training data (which involve no change), but data during the acquisition of which a structural change might have occurred. Then the appropriate statistic is that of (11) with the ECF computed at each time , with data , while the ECF is computed in an analogous manner but with data .
4 Asymptotics
In this section we study the limit behavior of the test procedure. Particularly, we have to study the limit behavior of
(18) 
with defined in (13) under the null hypothesis as well as under alternatives. The limit is always for and fixed. The proofs of the lemmas and theorems in this Section are deferred to the Appendix.
Define
(19) 
and
where is a two dimensional realvalued function on , defined by
(20) 
we can express
(21) 
where is replaced with when investigating the asymptotic behavior of .
To obtain the limiting distribution of (18), we consider this problem within a more general framework because the techniques used here would be applicable to other situations, for example, the monitoring procedure that uses the probability generating function approach as in Hudecová et al. (2015a). In what follows, is assumed to be a dimensional strongly mixing strictly stationary sequence with mixing order , that is,
where denotes the sigma field generated by . Suppose that is any realvalued function on , . Later, a real vector function will be considered.
In view of (19), for any integer , define
(22) 
where is any number between 0 and . Without loss of generality, we assume because can be replaced with in .
Below, we impose the conditions:

(A1) The is nonnegative, continuous and bounded with .

(A2) for some for some .

(A3) , and for some , .

(A4) is continuous in , and .
Condition (A2) is quite mild since a broad class of time series sequences, including ARMA and GARCH processes, are strongly mixing with order decaying to 0 geometrically. Owing to Lemma 2.1 of Kuelbs and Philipp (1980), we have that for all ,
(23) 
so that in (A4) converges absolutely. Moreover, due to Theorem A and Theorem 2.1 of Yang (2007), Conditions (A1)(A4) particularly guarantee the maximal moment inequality for , namely, for any , there exists depending only on and such that
(24) 
Then, we have the following:
Lemma 1. Under (A1)(A3), as
,
where when and denotes a standard Brownian motion. Further,
for any and , .
Based on this, we have the following weak convergence results (cf. Billingsley, 1968):
Theorem 1. Let
Assume that is any real number that can take the value of 0 only when . Then, under (A1)(A4), as ,
namely,
(25) 
and thus,
(26) 
Remark 1. (a) The result shows that when , . However, the range of for is not allowed to be but for . Although the result on can be easily claimed, it is not easy to verify. The difficulty lies in checking the tightness (equicontinuity) condition such as that in Theorem 15.6 of Billingsley (1968). The approaches taken by Aue et al. (2006) and Hudecová et al. (2015a, b) are not directly applicable to solving this problem. In practice, though, the can be chosen arbitrarily small, which can be justified from a practical viewpoint that a change is assumed to occur at for some .
(b) The can be estimated by
(27) 
where and is a sequence such that as , for example, .
(c) If (A3) and (A4) hold with replaced by in a compact subset of , then the results in (25) and (26) hold when the integral over is replaced with the integral over .
Remark 2. In our study the strong mixing condition is assumed and the strong approximation theorems are used to achieve the C.L.T. of the relevant partial sum process. However, one can view as an element of by resetting and using
(28) 
Our test statistic can be redefined with , which is a subprocess of , and thereby, the strong approximation theorems may not be required. In our simulation study, the stationary bootstrap method is used to calculate critical values. A proof for the consistency of the bootstrap method is provided in Supplementary material.
The above result is useful to test the hypotheses:
: No change of over
vs. : not .
In
order to conduct a test, one can employ the test: for some nonnegative integer (=0 when and when ),
Based on Theorem 1, we reject at the significance level if , where is the number such that
In practice, the
can be obtained through Monte Carlo simulations using an
estimate of .
Next, we extend the above results to dimensional functions
, . Let denote the
positive definite matrix whose th entry
is . Below, we
assume that

(A3) All enjoy the properties in (A3).

(A4) is continuous and .
Then, we have the following:
Theorem 2. Suppose that (A1), (A2), (A3) and (A4) hold. Let be the one in Theorem 1. Then, as ,
(29) 
where is a standard dimensional Brownian motion, and thus,
(30) 
The testing procedure based on the above result is similar to the univariate case. That is, we reject the null of no change if
(31) 
where satisfies
(32) 
The can be obtained through Monte Carlo simulations
using a suitable estimate of .
Now, we are ready to apply Theorem 2 with and to in . Assume that
is a strongly mixing process with mixing
order satisfying and satisfies
(A1). Note that under in ,
(33) 
where is a matrix whose th entries are , , and (see ) and satisfies . We reject in (, if where is the number in .
Meanwhile, to examine the consistency of the monitoring procedure, we consider the alternative hypothesis:
for some and . Then, in view of , we have that under ,
in probability, so that in probability, which asserts the consistency of our monitoring procedure as in .
The matrix should be estimated properly as done in (. Also, as mentioned earlier in and , can be expressed in closed form for particular weight functions. By way of example in the empirical study below, we will illustrate the performance of the monitoring procedure with the Gaussian weight function.
5 The resampling procedure
As already seen in the previous section, the asymptotic distribution of the detector statistic in (18) under the null hypothesis depends on factors that are unknown in our entirely nonparametric context. For this reason we employ the stationary bootstrap procedure (see Politis and Romano, 1994) in order to estimate the critical value of the test. To be concrete, the estimator for is given by the relation
where denotes the statistic in (13) applied to the resampled observations obtained using the resampling scheme below. Notice that is estimated by making use of only the training data, i.e. all data available at time .
First, wrap the observations around a circle so that for , is defined to be with . Given a block size , define the overlapping blocks , . Let and proceed as follows:

Independently of the , generate independent observations
from the geometric distribution with parameter
. Let . 
Independently of the and the , generate independent observations
from the discrete uniform distribution on
. Define the sequence of bootstrap blocks by . 
Concatenate the and select the first observations as the bootstrap sample .

Treat the first bootstrap observations as a training sample and calculate for each .

Calculate .
Repeat steps 1–5 a large number of times, say , to obtain the ordered statistics . An approximate value for is then given by , where denotes the floor of .
6 Monte Carlo results
We investigate the performance of the monitoring procedure defined by (13) and (18) with criterion given by (6), and weight function . The results corresponding to criterion (9) are similar and therefore are not reported. We consider the following data generating processes (DGPs):
where , and , with uniformly distributed on . The sequence is an independent sequence of iid stable random variables with characteristic function .
The DGPs consist of a range of serial dependence structures, such as AR, ARCH and GARCH structures. The first seven processes, DGP.S1–DGP.S7, satisfy the null hypothesis of strict stationarity and are introduced to study the size of our monitoring procedure. DGP.S1 and DGP.S7 consists of i.i.d. observations, whereas DGP.S2–DGP.S6 introduce some dependence structure without violating the null hypothesis. To examine the power of the procedure in different settings, DGP.P1–DGP.P5 remain stationary throughout the training period, after which a break in stationarity is introduced during monitoring. DGP.P1 and DGP.P2 introduce breaks in the first and second moments (respectively) of the process, whereas DGP.P3 introduces a change in the distribution without affecting the first two moments of the DGP (i.e. the process remains weakly stationary throughout the monitoring period).
We compare the performance of our test against that of several related tests proposed recently in the literature. We consider the following tests:

a kernelbased nonparametric test for changes in the marginal density function proposed by Lee and Na (2004), denoted by ;

a related kernelbased test of Kapetanios (2009), denoted by ;

a Cramér–von Mises type test for changes in distribution based on the empirical distribution function, proposed by Ross and Adams (2012), denoted by .
Although the tests and were originally proposed as offline (retrospective) tests for stationarity, they have been adapted to online monitoring procedures. A similar adaptation of the retrospective test by Hong et al. (2017) to an online procedure was also considered, but the obtained simulation results (not shown) suggest that the adapted procedure does not respect the desired nominal size. We therefore exclude this procedure from our study.
The values in Tables 1–6 represent the percentage of times that was rejected out of 1 000 independent Monte Carlo repetitions for the various DGPs. For our test, denoted by in the tables, we report the results for various choices of the parameters and appearing in the test. To estimate the critical value of each test we employed the bootstrap scheme described in Section 5 and, since the calculations are very time consuming, made use of the warpspeed method of Giacomini et al. (2013). Throughout, we took .
An overall assessment of the figures in Tables 1–6 brings forward certain desirable features of the method: a reasonable degree of approximation of the nominal level, as well as satisfactory power against alternatives. Although our test exhibits some degree of moderate size distortion with the autoregressive process DGP.S2, this overrejection certainly diminishes with increasing sample sizes. Note however that overall the nominal level is respected reasonably well, and also in some cases where some of the other considered tests are substantially oversized.
As expected, the power of the tests increases with the sample size, and also with the extent of violation of the null hypothesis of strict stationarity. Overall, in terms of power, our test performs reasonably well when compared to the other existing procedures and even outperforms these tests for many of the DGPs considered. A further factor influencing the percentage of rejection is the value of the weight parameter , and in this respect it seems that an intermediate value is preferable in terms of level accuracy as well as power, at least in most cases.
DGP.S1  DGP.S2  DGP.S3  DGP.S4  

1  100  1  3.6  3.8  3.8  4.0  4.1  8.4  9.6  9.6  9.7  9.9  7.8  5.3  6.0  5.4  4.3  3.8  4.3  3.4  2.9  3.6 
2  4.2  4.6  5.0  4.8  5.2  8.7  8.1  9.1  9.2  9.0  7.1  7.1  6.1  5.2  4.8  4.5  6.4  6.3  6.1  5.7  
3  6.1  4.1  4.5  4.2  3.9  7.4  8.5  8.4  8.4  8.4  7.9  7.8  6.6 