In univariate time series analysis, long memory was brought to prominence by Hurst (1951) and Mandelbrot (1963), and it has subsequently received extensive attention in the literature (see, e.g., Beran, 1994; Embrechts & Maejima, 2002; Doukhan et al., 2003; Robinson, 2003; Palma, 2007). Of importance in analyzing and modeling long-memory univariate time series is estimating the strength of the long-memory dependence. There are two measures commonly used: The parameter , known as the Hurst exponent or self-similarity parameter (Mandelbrot & van Ness, 1968) and the fractional integration parameter, , arises from the generalization of autoregressive fractionally integrated moving average (ARFIMA) models from integer to non-integer values of the integration parameter . The two parameters are closely related through the simple formula .
In univariate time series analysis, a number of Hurst exponent estimators have been developed, and theoretical results on the asymptotic properties of various estimators have been obtained. Because the finite-sample properties of these estimators can be quite different from their asymptotic properties, several authors considered an empirical comparison of estimators of and . Nine estimators were discussed in some detail by Taqqu et al. (1995) who performed an empirical investigation of these estimators for a single series length of 10,000 data points, five values of both and , and 50 replications. Teverovsky & Taqqu (1997) showed in a simulation study that the differenced variance estimator was unbiased for five values of (0.5, 0.6, 0.7, 0.8 and 0.9) for series with 10,000 observations whereas the aggregated variance estimator was downwards biased. Jensen (1999) presented a comparison of two estimators based on wavelets, and a Geweke-Porter-Hudak (GPH) estimator for four series lengths ( observations), five values of and 1,000 replications. Jeong et al. (2007) performed a comparison of six estimators on simulated fractional Gaussian noise with observations, five values of and 100 replications.
Long-memory functional time series analysis was recently studied by Li et al. (2019), who proposed an R/S estimation method for determining long-memory parameter in a functional ARFIMA model, where observations are temporally dependent continuous functions, for example, age-specific fertility rate improvement observed over the years (e.g., Hyndman & Ullah, 2007; Chiou & Müller, 2009)
. The functional ARFIMA model can be viewed as a generalization of many parametric models. For example,Bosq (2000) and Bosq & Blanke (2007)
provided the functional autoregressive of order 1 (FAR(1)) and derived one-step-ahead forecasts that are based on a regularized form of the Yule-Walker equations. Later, FAR(1) was extended to FAR(), where the order can be determined via a sequential hypothesis testing procedure (Kokoszka & Reimherr, 2013). Aue et al. (2015)
proposed a forecasting method based on vector autoregressive (VAR) forecasts of principal component scores. The method ofAue et al. (2015) can also be viewed as an extension of Hyndman & Shang (2009), where principal component scores are forecast via a univariate time series forecasting method. Klepsch & Klüppelberg (2017) considered the functional moving average (FMA) process and introduced an innovation algorithm to obtain the best linear predictor. Klepsch et al. (2017) extended the VAR model to vector autoregressive moving average model for modeling and forecasting principal component scores, which can be viewed as a simpler estimation approach of the functional autoregressive moving average. Aue & Klepsch (2017) showed the equivalent relationship between FMA and vector moving average.
A central issue in functional time series analysis is to model the temporal dependence of the functional observations accurately. Following the early work of Li et al. (2019), we compare the finite-sample estimation accuracy of several Hurst exponent estimators in functional ARFIMA models. Our method constructs an estimate of the long-run covariance function, which we use, via dynamic functional principal component analysis, in estimating the orthonormal functions spanning the dominant sub-space of the curves. Based on the first set of principal component scores, we apply several univariate time series Hurst exponent estimators, and compare their estimation accuracy in terms of bias, variance and mean square error. Our goal is to provide some practical guidance on the method that provides the best estimation accuracy of the Hurst exponent.
The remainder of the paper is outlined as follows. In Section 2, we present two methods for estimating long-run covariance function, from which the dominant set of principal component scores can be obtained. In Section 3, we revisit some long-memory univariate time series estimators for estimating the Hurst exponent. In Section 4, we compare the estimation accuracy of various estimators and make our recommendation.
2 Dynamic functional principal component analysis
2.1 Estimation of the long-run covariance function
A time series of functions can be denoted as , where and each is a random function of a stochastic process where represents a continuum bounded within a finite interval of the real line. Further, let be a stationary and ergodic functional time series. For a stationary functional time series, the long-run covariance operator is defined as
and is a well-defined element of for a compact support interval
, under mild weak dependence and moment conditions. By assumingis a continuous and square-integrable function, the function induces the kernel operator . Through right integration, defines a Hilbert-Schmidt integral operator on given by
In practice, we need estimate from a finite sample . Given its definition as a bi-infinite sum, a natural estimator of is
where is the so-called memory parameter, denotes a lag variable, and
is an estimator of . In the case of stationary short-memory functional time series, it is known that . From (1), the estimated long-run covariance is obtained by summing all autocovariance functions with linearly decreasing weights. Let denote the number of grid points in a curve. In Li et al. (2019), they consider . For instance, when , all finite-order lags are utilized. To estimate the value of , Li et al. (2019) applied the rescaled range (R/S) estimator of Hurst (1951) to the first set of dynamic principal component scores obtained from eigendecomposition of
since in (1) is a constant and it does not affect the estimation of the orthonormal functions spanning the dominant sub-space of functional time series.
2.1.1 Kernel sandwich estimator
where is called the bandwidth parameter and is a symmetric weight function with bounded support of order . The kernel sandwich estimator in (2) was introduced in Panaretos & Tavakoli (2012), Horváth et al. (2013), Rice & Shang (2017), Kokoszka & Reimherr (2017, Chapter 8.5), among others. As with any kernel estimator, the crucial part is on the estimation of bandwidth parameter . It can be selected through a data-driven approach, such as the plug-in algorithm of Rice & Shang (2017). The plug-in bandwidth selection method can be summarized as:
Compute pilot estimates of , for and initial order of kernel function :
that utilize an initial bandwidth choice , and weight function of order .
As established in Berkes et al. (2016), estimate by
where denotes the final order of kernel function, is a constant depending on the final order of kernel function, and is a weight depending on the initial order of kernel function. A list of and values is presented in Table 1.
Kernel function Bartlett 1 2/3 Parzen 6 0.539285 Tukey-Hanning 3/4 Quadratic Spectral 1 Flat-top 4/3 Table 1: A list of and values
Use the bandwidth
in the definition of in (2).
2.2 Dynamic functional principal component decomposition
From the long-run covariance , we apply functional principal decomposition to extract the functional principal components and their associated scores. With Karhunen-Loève expansion, a stochastic process can be expressed as
is an uncorrelated random variable with zero mean and unit variance. The principal component scoreis given by the projection of in the direction of the th eigenfunction , i.e., . The scores constitute an uncorrelated sequence of random variables with zero mean and variance which is the th eigenvalue. They can be interpreted as the weights of the contribution of the functional principal components to .
Since the long-run covariance is unknown, the population eigenvalues and eigenfunctions can only be approximated through realizations of . The sample mean and sample covariance are given by
where are the sample eigenvalues of , and are the corresponding orthogonal sample eigenfunctions. The realizations of the stochastic process can be written as
where , and is the th estimated principal component score for the th time period.
3 Hurst exponent estimators
Let the first set of estimated dynamic principal component scores be . Since we consider the first set of scores, we shall replace by hereafter. In Li et al. (2019), they also consider a norm of multiple sets of scores and find the estimation results remain similar. Due to space constraints, we present our results based on the first set of principal component scores. With the univariate time series of scores , we evaluate and compare some Hurst exponent estimators from long-memory univariate time-series literature.
The Hurst exponent can be estimated either via time- or frequency-domain based estimators. These estimators can be divided into parametric and semi-parametric ones. The theory of parametric estimators was developed by Fox & Taqqu (1986) and Dahlhaus (1989). Semiparametric estimators of the memory parameter have become popular since they do not require knowing the specific form of the short-memory structure. They are based on the periodograms of the series, and can be categorized into two types: the log-periodogram estimator first proposed by Geweke & Porter-Hudak (1983) and the local-Whittle estimator which is credited to Künsch (1987) and further developed by Robinson (1995a)
. The log-periodogram estimator is akin to the ordinary least squares and the local-Whittle estimator to the maximum likelihood estimator in the frequency domain.
3.1 Time-domain based estimators
, we present five methods based on a simple linear regression model. In Sections3.1.6 and 3.1.7, we present two methods based on the R/S estimator.
3.1.1 Aggregated variance estimator
The aggregated variance estimator is based on the property of self-similar processes that variances of the aggregated processes decrease at the rate as the block size increases (e.g., Taqqu et al., 1995; Teverovsky & Taqqu, 1997; Beran, 1994, Section 4.4). Recall that for a long-range dependent linear process,
where is a constant. Consequently,
With the predictor variable of
and the response variable of, we apply a simple linear regression to obtain an estimate of the slope parameter. For instance, one may define the following procedure:
Divide the time series into non-overlapping blocks with block size and then average within each block, that is considered the aggregated series
where denotes a block index and denotes the number of blocks.
Compute the overall mean
For a given , compute the sample variance of as
Heuristically, when grows, . Thus, grows approximately at the rate . For different values of , compute (3) and (4) to obtain . It is recommended by Taqqu et al. (1995) and Teverovsky & Taqqu (1997) to choose values of that are equispaced on a logarithmic scale. Then, regress against to obtain regression coefficient . The estimated value of is given by
3.1.2 Differencing variance estimator
To distinguish non-stationarity from long-range dependence, we can difference the variance (see, e.g., Teverovsky & Taqqu, 1997). For a given , we compute the difference of the sample variance
3.1.3 Absolute values of the aggregated series
Similar to the aggregated variance, the data are split in the same fashion, and the aggregated mean is computed from (3). Instead of computing the sample variance, one finds the sum of the absolute values of the aggregated series, namely
For different values of , compute (6) to obtain . Then, regress against to obtain regression coefficient . The estimated value of is given by
3.1.4 Higuchi’s method
Similar to the absolute values of the aggregated series, the method of Higuchi (1988) calculates the partial sums of the time series , and then finding the normalized length of the curve, namely
where is the sample size of the time series, is a block size and denotes the greatest integer function. Since where . For different values of , compute (7) to obtain . Then, regress against to obtain regression coefficient . The estimated value of is given by
3.1.5 Detrended fluctuation analysis (DFA)
Also known as a variance of residuals or Peng’s method, DFA was introduced by Peng et al. (1994) to provide evidence of long memory in deoxyribonucleic acid (DNA) sequences. It consists of the following steps:
The data series is divided into nonoverlapping blocks and each block with size such that .
Within each of the blocks, we regress against and estimate the variance of the residuals by
where and are least squares regression estimates based on the th block.
Compute the average of the variance of the residuals
Heuristically, grows at the rate . For different values of , compute (8) to obtain . Then, regress against to obtain regression coefficient . The estimated value of is given by
The DFA bears a strong resemblance to the variance plot, but instead of assuming stationarity, a fitted linear trend is subtracted from each block (Beran et al., 2013). Therefore, the DFA is less sensitive to the trend exhibited in the data.
3.1.6 Rescaled Range (R/S) estimator
The R/S estimator was introduced by Hurst (1951) for estimating the minimum capacity of a dam. The R/S estimator is one of the first methods for estimating Hurst exponent. Although many Hurst exponent estimators have better statistical properties than the R/S estimator (which, for example, is inefficient in the case of Gaussian innovations), it is a simple method that computes fast (see, e.g., Li et al., 2019). Given a time series of scores , calculation of the R/S statistic has the following steps:
Calculate the range
The R/S estimator may be defined by
The plot of against is also known as “pox plots”.
3.1.7 Rescaled adjusted range estimator
While the R/S estimator is applied to the original time series, the rescaled adjusted range estimator is implemented to the partial sum of the original time series (see, e.g., Mandelbrot & Wallis, 1969; Mandelbrot, 1975; Mandelbrot & Taqqu, 1979). For a univariate time series of principal component scores with the partial sum
and sample variance
the rescaled adjusted range estimator is given by
Choosing logarithmically equidistant values of , regress against to obtain regression coefficient . The estimated value of is given by
3.2 Frequency-domain based estimators
3.2.1 (Smoothed) periodogram estimator
With a univariate time series of scores , the periodogram can be defined as
where denotes the set of harmonic frequencies, where is a positive integer, and . Since the periodogram is a measure of autocovariance, it can also be expressed as
where denotes the sample autocovariance function, i.e.,
where is the sample mean of the time series of scores.
Because is an estimator of the spectral density, a time series with long-range dependence should have a periodogram which is proportional to close to the origin (Taqqu et al., 1995). Thus, regress the logarithm of the periodogram for different values of against to obtain regression coefficient . The estimated value of is given by
As advocated by Taqqu et al. (1995), we use only the lowest 10% of the frequencies for the regression, since the proportionality above 10% only holds for close to the origin.
The frequency axis is divided into logarithmically equidistant boxes, and the periodogram values corresponding to the frequencies inside the box are averaged, to obtain smoothed periodogram. The periodogram values at very low frequencies are remained, while the rest are divided into 60 boxes (see, e.g., Taqqu et al., 1995). By regressing the logarithm of the smoothed periodogram against frequencies, we obtain regression coefficient . To achieve the robustness in the least square fitting, we use a robust linear model. The estimated value of is given by
3.2.2 (Smoothed) Geweke-Porter-Hudak estimator
In the univariate ARFIMA models, Geweke & Porter-Hudak (1983) proposed a semiparametric estimator of based on the first periodogram ordinates given in (9). Let be a stationary time series with spectral density
as , where . Recall that the empirical estimate to the spectral density is the periodogram given in (10),
where . In practice, we replace by its empirical analogy , thus
By a simple linear regression, Geweke & Porter-Hudak (1983) suggested the least-square estimator
where , and . Note that for are the smallest Fourier frequencies. The number acts as a bandwidth parameter. Following Geweke & Porter-Hudak (1983), we choose .
Further, Robinson (1995b)
showed that this estimator is consistent and has a central limit theorem of the form
Reisen (1994) considered a smoothed periodogram using the Parzen lag window, for estimating the parameter . Let denote a smoothed periodogram of the form
where is called the lag window generator, a fixed continuous even function in the range , with and . The bandwidth parameter is a function of , and it is customarily chosen as . The Parzen lag window generator has the following form:
The smoothed periodogram estimator can be written as
3.2.3 Wavelet estimator
This estimator computes the discrete wavelet transform, and obtains the wavelet coefficient associated with a mean zero process with . The wavelet coefficient as are distributed , where is a finite constant (Jensen, 1999). The variance depends on the scaling parameter but is independent of the translation parameter . We define
be the wavelet coefficient’s variance at scale . Taking the logarithm transformation of , we obtain
where can be estimated via ordinary least squares. Since is a population quantity, we estimate it by the sample variance of the wavelet coefficients as
3.2.4 Local Whittle estimator
The local Whittle estimator is a Gaussian semiparametric estimation method to estimate the Hurst exponent based on the periodogram. It is first introduced by Künsch (1987) and later developed by Robinson (1995a), Velasco (1999) and subsequent authors. The local Whittle method does not require the specification of a parametric model for the data. It only relies on the specification of the shape of the spectral density of the time series .
Note that the spectral density of a stationary time series is usually assumed to satisfy that
where , and .
Define as the objective function
where the closed interval of admissible estimates of true value of the self-similarity measure , , and are numbers picked such that as defined in Robinson (1995a). Alternatively, we may obtain
Further, Robinson (1995a) showed that is a consistent estimator of , and as .
3.2.5 Local Whittle estimator with tapering
Velasco (1999) showed that it is possible to estimate consistently the Hurst exponent of non-stationary processes using the local Whittle estimator by tapering the observations. Let the tapered periodogram of be , and define as the objective function
where all the summations run for , assuming is integer. Define the closed interval of admissible estimate of , , and are numbers picked such that and where is the maximum value of we can estimate with tapers of order , and may lie in a region where is non-stationary. When , (12) reduces to (11).
3.2.6 Modified local Whittle estimator
Hou & Perron (2014) proposed a modified local Whittle estimator that has good properties under local contamination. These contaminations include processes whose spectral density functions dominate at low frequencies, such as random level shifts, deterministic level shifts and deterministic trends (Hou & Perron, 2014). The data generating process is given
where is a process with memory parameter and is a constant. When , is a short-memory process. The process is the low frequency contamination. For a given sample size , we define the periodogram of process to be and . Since the periodogram of is of order , we add a term to the spectral density function of to govern the low frequency contamination. The modified spectral density function is . Let be the noise-to-signal ratio, the modified spectral density function is
The modified local Whittle estimator is
3.2.7 Exact local Whittle estimator
The local Whittle estimator is based on an approximation of , where denotes the original time series and denotes the noise process. Shimotsu & Phillips (2005)
proposed an exact local Whittle estimator that uses a corrected discrete Fourier transform ofto approximate periodogram . They consider the fractional process generated by the model
The discrete Fourier transform of a time series evaluated at frequency as
Define as the objective function
where is the periodogram of
As in Robinson (1995a), we define the estimates