A comparison of Hurst exponent estimators in long-range dependent curve time series

by   Han Lin Shang, et al.
Australian National University

The Hurst exponent is the simplest numerical summary of self-similar long-range dependent stochastic processes. We consider the estimation of Hurst exponent in long-range dependent curve time series. Our estimation method begins by constructing an estimate of the long-run covariance function, which we use, via dynamic functional principal component analysis, in estimating the orthonormal functions spanning the dominant sub-space of functional time series. Within the context of functional autoregressive fractionally integrated moving average models, we compare finite-sample bias, variance and mean square error among some time- and frequency-domain Hurst exponent estimators and make our recommendations.



There are no comments yet.


page 1

page 2

page 3

page 4


Hilbert valued fractionally integrated autoregressive moving average processes with long memory operators

Fractionally integrated autoregressive moving average processes have bee...

The Hurst roughness exponent and its model-free estimation

We say that a continuous real-valued function x admits the Hurst roughne...

On trend and its derivatives estimation in repeated time series with subordinated long-range dependent errors

For temporal regularly spaced datasets, a lot of methods are available a...

Multiscale Information Storage of Linear Long-Range Correlated Stochastic Processes

Information storage, reflecting the capability of a dynamical system to ...

Spectral estimation for non-linear long range dependent discrete time trawl processes

Discrete time trawl processes constitute a large class of time series pa...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In univariate time series analysis, long memory was brought to prominence by Hurst (1951) and Mandelbrot (1963), and it has subsequently received extensive attention in the literature (see, e.g., Beran, 1994; Embrechts & Maejima, 2002; Doukhan et al., 2003; Robinson, 2003; Palma, 2007). Of importance in analyzing and modeling long-memory univariate time series is estimating the strength of the long-memory dependence. There are two measures commonly used: The parameter , known as the Hurst exponent or self-similarity parameter (Mandelbrot & van Ness, 1968) and the fractional integration parameter, , arises from the generalization of autoregressive fractionally integrated moving average (ARFIMA) models from integer to non-integer values of the integration parameter . The two parameters are closely related through the simple formula .

In univariate time series analysis, a number of Hurst exponent estimators have been developed, and theoretical results on the asymptotic properties of various estimators have been obtained. Because the finite-sample properties of these estimators can be quite different from their asymptotic properties, several authors considered an empirical comparison of estimators of and . Nine estimators were discussed in some detail by Taqqu et al. (1995) who performed an empirical investigation of these estimators for a single series length of 10,000 data points, five values of both and , and 50 replications. Teverovsky & Taqqu (1997) showed in a simulation study that the differenced variance estimator was unbiased for five values of (0.5, 0.6, 0.7, 0.8 and 0.9) for series with 10,000 observations whereas the aggregated variance estimator was downwards biased. Jensen (1999) presented a comparison of two estimators based on wavelets, and a Geweke-Porter-Hudak (GPH) estimator for four series lengths ( observations), five values of and 1,000 replications. Jeong et al. (2007) performed a comparison of six estimators on simulated fractional Gaussian noise with observations, five values of and 100 replications.

Long-memory functional time series analysis was recently studied by Li et al. (2019), who proposed an R/S estimation method for determining long-memory parameter in a functional ARFIMA model, where observations are temporally dependent continuous functions, for example, age-specific fertility rate improvement observed over the years (e.g., Hyndman & Ullah, 2007; Chiou & Müller, 2009)

. The functional ARFIMA model can be viewed as a generalization of many parametric models. For example,

Bosq (2000) and Bosq & Blanke (2007)

provided the functional autoregressive of order 1 (FAR(1)) and derived one-step-ahead forecasts that are based on a regularized form of the Yule-Walker equations. Later, FAR(1) was extended to FAR(

), where the order can be determined via a sequential hypothesis testing procedure (Kokoszka & Reimherr, 2013). Aue et al. (2015)

proposed a forecasting method based on vector autoregressive (VAR) forecasts of principal component scores. The method of

Aue et al. (2015) can also be viewed as an extension of Hyndman & Shang (2009), where principal component scores are forecast via a univariate time series forecasting method. Klepsch & Klüppelberg (2017) considered the functional moving average (FMA) process and introduced an innovation algorithm to obtain the best linear predictor. Klepsch et al. (2017) extended the VAR model to vector autoregressive moving average model for modeling and forecasting principal component scores, which can be viewed as a simpler estimation approach of the functional autoregressive moving average. Aue & Klepsch (2017) showed the equivalent relationship between FMA and vector moving average.

A central issue in functional time series analysis is to model the temporal dependence of the functional observations accurately. Following the early work of Li et al. (2019), we compare the finite-sample estimation accuracy of several Hurst exponent estimators in functional ARFIMA models. Our method constructs an estimate of the long-run covariance function, which we use, via dynamic functional principal component analysis, in estimating the orthonormal functions spanning the dominant sub-space of the curves. Based on the first set of principal component scores, we apply several univariate time series Hurst exponent estimators, and compare their estimation accuracy in terms of bias, variance and mean square error. Our goal is to provide some practical guidance on the method that provides the best estimation accuracy of the Hurst exponent.

The remainder of the paper is outlined as follows. In Section 2, we present two methods for estimating long-run covariance function, from which the dominant set of principal component scores can be obtained. In Section 3, we revisit some long-memory univariate time series estimators for estimating the Hurst exponent. In Section 4, we compare the estimation accuracy of various estimators and make our recommendation.

2 Dynamic functional principal component analysis

2.1 Estimation of the long-run covariance function

A time series of functions can be denoted as , where and each is a random function of a stochastic process where represents a continuum bounded within a finite interval of the real line. Further, let be a stationary and ergodic functional time series. For a stationary functional time series, the long-run covariance operator is defined as

and is a well-defined element of for a compact support interval

, under mild weak dependence and moment conditions. By assuming

is a continuous and square-integrable function, the function induces the kernel operator . Through right integration, defines a Hilbert-Schmidt integral operator on given by

whose eigenvalues and eigenfunctions are related to the dynamic functional principal components defined in

Hörmann et al. (2015).

In practice, we need estimate from a finite sample . Given its definition as a bi-infinite sum, a natural estimator of is


where is the so-called memory parameter, denotes a lag variable, and

is an estimator of . In the case of stationary short-memory functional time series, it is known that . From (1), the estimated long-run covariance is obtained by summing all autocovariance functions with linearly decreasing weights. Let denote the number of grid points in a curve. In Li et al. (2019), they consider . For instance, when , all finite-order lags are utilized. To estimate the value of , Li et al. (2019) applied the rescaled range (R/S) estimator of Hurst (1951) to the first set of dynamic principal component scores obtained from eigendecomposition of

since in (1) is a constant and it does not affect the estimation of the orthonormal functions spanning the dominant sub-space of functional time series.

2.1.1 Kernel sandwich estimator

Another long-run covariance estimator is the kernel sandwich estimator inspired by Andrews (1991) and Andrews & Monahan (1992). It is given by


where is called the bandwidth parameter and is a symmetric weight function with bounded support of order . The kernel sandwich estimator in (2) was introduced in Panaretos & Tavakoli (2012), Horváth et al. (2013), Rice & Shang (2017), Kokoszka & Reimherr (2017, Chapter 8.5), among others. As with any kernel estimator, the crucial part is on the estimation of bandwidth parameter . It can be selected through a data-driven approach, such as the plug-in algorithm of Rice & Shang (2017). The plug-in bandwidth selection method can be summarized as:

  1. Compute pilot estimates of , for and initial order of kernel function :

    that utilize an initial bandwidth choice , and weight function of order .

  2. As established in Berkes et al. (2016), estimate by

    where denotes the final order of kernel function, is a constant depending on the final order of kernel function, and is a weight depending on the initial order of kernel function. A list of and values is presented in Table 1.

    Kernel function
    Bartlett 1 2/3
    Parzen 6 0.539285
    Tukey-Hanning 3/4
    Quadratic Spectral 1
    Flat-top 4/3
    Table 1: A list of and values

    For the initial kernel function, Rice & Shang (2017) recommend to use flat-top kernel function, i.e., . For the final kernel function, Rice & Shang (2017) recommend to use Bartlett kernel function, i.e., . Further, there exists satisfying .

  3. Use the bandwidth

    in the definition of in (2).

2.2 Dynamic functional principal component decomposition

From the long-run covariance , we apply functional principal decomposition to extract the functional principal components and their associated scores. With Karhunen-Loève expansion, a stochastic process can be expressed as

where and

is an uncorrelated random variable with zero mean and unit variance. The principal component score

is given by the projection of in the direction of the th eigenfunction , i.e., . The scores constitute an uncorrelated sequence of random variables with zero mean and variance which is the th eigenvalue. They can be interpreted as the weights of the contribution of the functional principal components to .

Since the long-run covariance is unknown, the population eigenvalues and eigenfunctions can only be approximated through realizations of . The sample mean and sample covariance are given by

where are the sample eigenvalues of , and are the corresponding orthogonal sample eigenfunctions. The realizations of the stochastic process can be written as

where , and is the th estimated principal component score for the th time period.

Hörmann et al. (2015) showed that kernel sandwich estimator in (2) is a consistent estimator of the true and unknown long-run covariance, and estimated functional principal components and principal component scores extracted from the estimated long-run covariance are also consistent.

3 Hurst exponent estimators

Let the first set of estimated dynamic principal component scores be . Since we consider the first set of scores, we shall replace by hereafter. In Li et al. (2019), they also consider a norm of multiple sets of scores and find the estimation results remain similar. Due to space constraints, we present our results based on the first set of principal component scores. With the univariate time series of scores , we evaluate and compare some Hurst exponent estimators from long-memory univariate time-series literature.

The Hurst exponent can be estimated either via time- or frequency-domain based estimators. These estimators can be divided into parametric and semi-parametric ones. The theory of parametric estimators was developed by Fox & Taqqu (1986) and Dahlhaus (1989). Semiparametric estimators of the memory parameter have become popular since they do not require knowing the specific form of the short-memory structure. They are based on the periodograms of the series, and can be categorized into two types: the log-periodogram estimator first proposed by Geweke & Porter-Hudak (1983) and the local-Whittle estimator which is credited to Künsch (1987) and further developed by Robinson (1995a)

. The log-periodogram estimator is akin to the ordinary least squares and the local-Whittle estimator to the maximum likelihood estimator in the frequency domain.

3.1 Time-domain based estimators

In Sections 3.1.1 to 3.1.5

, we present five methods based on a simple linear regression model. In Sections 

3.1.6 and 3.1.7, we present two methods based on the R/S estimator.

3.1.1 Aggregated variance estimator

The aggregated variance estimator is based on the property of self-similar processes that variances of the aggregated processes decrease at the rate as the block size increases (e.g., Taqqu et al., 1995; Teverovsky & Taqqu, 1997; Beran, 1994, Section 4.4). Recall that for a long-range dependent linear process,

where is a constant. Consequently,

With the predictor variable of

and the response variable of

, we apply a simple linear regression to obtain an estimate of the slope parameter. For instance, one may define the following procedure:

  1. Divide the time series into non-overlapping blocks with block size and then average within each block, that is considered the aggregated series


    where denotes a block index and denotes the number of blocks.

  2. Compute the overall mean

  3. For a given , compute the sample variance of as

  4. Heuristically, when grows, . Thus, grows approximately at the rate . For different values of , compute (3) and (4) to obtain . It is recommended by Taqqu et al. (1995) and Teverovsky & Taqqu (1997) to choose values of that are equispaced on a logarithmic scale. Then, regress against to obtain regression coefficient . The estimated value of is given by

3.1.2 Differencing variance estimator

To distinguish non-stationarity from long-range dependence, we can difference the variance (see, e.g., Teverovsky & Taqqu, 1997). For a given , we compute the difference of the sample variance


For different values of , compute (3), (4) and (5) to obtain . Then, regress against to obtain regression coefficient . The estimated value of is given by

3.1.3 Absolute values of the aggregated series

Similar to the aggregated variance, the data are split in the same fashion, and the aggregated mean is computed from (3). Instead of computing the sample variance, one finds the sum of the absolute values of the aggregated series, namely


For different values of , compute (6) to obtain . Then, regress against to obtain regression coefficient . The estimated value of is given by

3.1.4 Higuchi’s method

Similar to the absolute values of the aggregated series, the method of Higuchi (1988) calculates the partial sums of the time series , and then finding the normalized length of the curve, namely


where is the sample size of the time series, is a block size and denotes the greatest integer function. Since where . For different values of , compute (7) to obtain . Then, regress against to obtain regression coefficient . The estimated value of is given by

3.1.5 Detrended fluctuation analysis (DFA)

Also known as a variance of residuals or Peng’s method, DFA was introduced by Peng et al. (1994) to provide evidence of long memory in deoxyribonucleic acid (DNA) sequences. It consists of the following steps:

  1. The data series is divided into nonoverlapping blocks and each block with size such that .

  2. Within each of the blocks, we regress against and estimate the variance of the residuals by

    where and are least squares regression estimates based on the th block.

  3. Compute the average of the variance of the residuals

  4. Heuristically, grows at the rate . For different values of , compute (8) to obtain . Then, regress against to obtain regression coefficient . The estimated value of is given by

The DFA bears a strong resemblance to the variance plot, but instead of assuming stationarity, a fitted linear trend is subtracted from each block (Beran et al., 2013). Therefore, the DFA is less sensitive to the trend exhibited in the data.

3.1.6 Rescaled Range (R/S) estimator

The R/S estimator was introduced by Hurst (1951) for estimating the minimum capacity of a dam. The R/S estimator is one of the first methods for estimating Hurst exponent. Although many Hurst exponent estimators have better statistical properties than the R/S estimator (which, for example, is inefficient in the case of Gaussian innovations), it is a simple method that computes fast (see, e.g., Li et al., 2019). Given a time series of scores , calculation of the R/S statistic has the following steps:

  1. Calculate the range

  2. Calculate the scale

    If is second-order stationary, then

    converges in probability to

    (Beran et al., 2013, p.410).

  3. The R/S estimator may be defined by

    The plot of against is also known as “pox plots”.

3.1.7 Rescaled adjusted range estimator

While the R/S estimator is applied to the original time series, the rescaled adjusted range estimator is implemented to the partial sum of the original time series (see, e.g., Mandelbrot & Wallis, 1969; Mandelbrot, 1975; Mandelbrot & Taqqu, 1979). For a univariate time series of principal component scores with the partial sum

and sample variance

the rescaled adjusted range estimator is given by

Choosing logarithmically equidistant values of , regress against to obtain regression coefficient . The estimated value of is given by

3.2 Frequency-domain based estimators

3.2.1 (Smoothed) periodogram estimator

With a univariate time series of scores , the periodogram can be defined as


where denotes the set of harmonic frequencies, where is a positive integer, and . Since the periodogram is a measure of autocovariance, it can also be expressed as

where denotes the sample autocovariance function, i.e.,

where is the sample mean of the time series of scores.

Because is an estimator of the spectral density, a time series with long-range dependence should have a periodogram which is proportional to close to the origin (Taqqu et al., 1995). Thus, regress the logarithm of the periodogram for different values of against to obtain regression coefficient . The estimated value of is given by

As advocated by Taqqu et al. (1995), we use only the lowest 10% of the frequencies for the regression, since the proportionality above 10% only holds for close to the origin.

The frequency axis is divided into logarithmically equidistant boxes, and the periodogram values corresponding to the frequencies inside the box are averaged, to obtain smoothed periodogram. The periodogram values at very low frequencies are remained, while the rest are divided into 60 boxes (see, e.g., Taqqu et al., 1995). By regressing the logarithm of the smoothed periodogram against frequencies, we obtain regression coefficient . To achieve the robustness in the least square fitting, we use a robust linear model. The estimated value of is given by

3.2.2 (Smoothed) Geweke-Porter-Hudak estimator

In the univariate ARFIMA models, Geweke & Porter-Hudak (1983) proposed a semiparametric estimator of based on the first periodogram ordinates given in (9). Let be a stationary time series with spectral density


as , where . Recall that the empirical estimate to the spectral density is the periodogram given in (10),

where . In practice, we replace by its empirical analogy , thus

By a simple linear regression, Geweke & Porter-Hudak (1983) suggested the least-square estimator

where , and . Note that for are the smallest Fourier frequencies. The number acts as a bandwidth parameter. Following Geweke & Porter-Hudak (1983), we choose .

Further, Robinson (1995b)

showed that this estimator is consistent and has a central limit theorem of the form

Reisen (1994) considered a smoothed periodogram using the Parzen lag window, for estimating the parameter . Let denote a smoothed periodogram of the form

where is called the lag window generator, a fixed continuous even function in the range , with and . The bandwidth parameter is a function of , and it is customarily chosen as . The Parzen lag window generator has the following form:

The smoothed periodogram estimator can be written as

3.2.3 Wavelet estimator

This estimator computes the discrete wavelet transform, and obtains the wavelet coefficient associated with a mean zero process with . The wavelet coefficient as are distributed , where is a finite constant (Jensen, 1999). The variance depends on the scaling parameter but is independent of the translation parameter . We define

be the wavelet coefficient’s variance at scale . Taking the logarithm transformation of , we obtain

where can be estimated via ordinary least squares. Since is a population quantity, we estimate it by the sample variance of the wavelet coefficients as

3.2.4 Local Whittle estimator

The local Whittle estimator is a Gaussian semiparametric estimation method to estimate the Hurst exponent based on the periodogram. It is first introduced by Künsch (1987) and later developed by Robinson (1995a), Velasco (1999) and subsequent authors. The local Whittle method does not require the specification of a parametric model for the data. It only relies on the specification of the shape of the spectral density of the time series .

Note that the spectral density of a stationary time series is usually assumed to satisfy that

where , and .

Define as the objective function


where , and is a positive integer satisfying , and (see, e.g., Robinson, 1995a). As in Robinson (1995a), we define the estimates

where the closed interval of admissible estimates of true value of the self-similarity measure , , and are numbers picked such that as defined in Robinson (1995a). Alternatively, we may obtain


Further, Robinson (1995a) showed that is a consistent estimator of , and as .

3.2.5 Local Whittle estimator with tapering

Velasco (1999) showed that it is possible to estimate consistently the Hurst exponent of non-stationary processes using the local Whittle estimator by tapering the observations. Let the tapered periodogram of be , and define as the objective function


where all the summations run for , assuming is integer. Define the closed interval of admissible estimate of , , and are numbers picked such that and where is the maximum value of we can estimate with tapers of order , and may lie in a region where is non-stationary. When , (12) reduces to (11).

As in Velasco (1999), we define the estimates

Alternatively, we may obtain


The tapered periodogram includes only frequencies for . The periodogram for non-stationary processes is equivalent to the periodogram for stationary processes evaluated at these frequencies (Velasco, 1999).

3.2.6 Modified local Whittle estimator

Hou & Perron (2014) proposed a modified local Whittle estimator that has good properties under local contamination. These contaminations include processes whose spectral density functions dominate at low frequencies, such as random level shifts, deterministic level shifts and deterministic trends (Hou & Perron, 2014). The data generating process is given

where is a process with memory parameter and is a constant. When , is a short-memory process. The process is the low frequency contamination. For a given sample size , we define the periodogram of process to be and . Since the periodogram of is of order , we add a term to the spectral density function of to govern the low frequency contamination. The modified spectral density function is . Let be the noise-to-signal ratio, the modified spectral density function is


The modified local Whittle estimator is


3.2.7 Exact local Whittle estimator

The local Whittle estimator is based on an approximation of , where denotes the original time series and denotes the noise process. Shimotsu & Phillips (2005)

proposed an exact local Whittle estimator that uses a corrected discrete Fourier transform of

to approximate periodogram . They consider the fractional process generated by the model

The discrete Fourier transform of a time series evaluated at frequency as

Define as the objective function

where is the periodogram of

As in Robinson (1995a), we define the estimates