## 1 Introduction

High-frequency data, observed sequentially at small time intervals, arise in various settings and applications. For example, in social and behavioural investigations, such data are often collected in so-called intensive longitudinal studies; see

Bolger & Laurenceau (2013). In functional data analysis, the observations are often considered to be of one of two types: the dense setting, which corresponds to high-frequency data, and the sparse setting, where the data are of low frequency; see, among others, Wang et al. (2016) and Müller et al. (2011). In finance, analyzing high-frequency intra-day transaction data has received increasing attention; see Hautsch (2012), Aït-Sahalia & Jacod (2014) and the references therein.High-frequency observations are often contaminated by measurement errors. For example, in the dense functional data setting, it is common to assume that the observed discrete data are a noisy version of an underlying unknown smooth curve. In finance too, high-frequency data are often regarded as a noisy version of a latent continuous-time stochastic process, observed at consecutive discrete time points. The latent process is considered to have a continuous path, and the measurement error represents the market microstructure noise; see Aït-Sahalia & Yu (2009). Since increasing the sampling frequency implies smaller sampling errors caused by the discretization of the underlying continuous-time process, we could expect that high-frequency data enable more accurate inference. However, as the sampling frequency increases, the difference between nearby observations is dominated by random noise, which makes standard methods of inference inconsistent; see, for example, Zhang et al. (2005). Therefore, a main concern in high-frequency financial data analysis has been to find ways of recovering the signal of quantities of interest from noisy high-frequency observations; see Aït-Sahalia & Jacod (2014).

Revealing the distributional properties of the measurement errors is crucial for recovering the signal of the continuous-time process from noisy high-frequency data. First, many estimation procedures require distributional assumptions on the measurement errors. Second, statistical inference including hypothesis testing and confidence set estimation inevitably involves unknown nuisance parameters determined by the distribution of the measurement errors; see, for example, Zhang (2006), Xiu (2010), Aït-Sahalia et al. (2010), and Liu & Tang (2014).

The measurement errors themselves sometimes contain useful information, both theoretical and practical, so that successfully recovering the measurement error distribution can improve our understanding of data structures. For example, Dufrenot et al. (2014) argue that the microstructure noise can help understand financial crises, and Aït-Sahalia & Yu (2009) make a connection between the microstructure noise and the liquidity of stocks. In longitudinal studies, the measurement errors can help reveal interesting characteristics of a population, such as the way in which individuals misreport their diet. Jacod et al. (2017) recently highlighted the importance of statistical properties of the measurement errors, and studied the estimation of moments. Despite these important developments, to our knowledge, no method has been proposed for estimating the entire distribution of the errors.

In this paper, we consider frequency domain analyses of high-frequency data, focusing on the measurement errors contaminating the continuous-time processes. In high-frequency observations, the relative magnitude of the changes in values of the underlying continuous-time process is small. Compared with the measurement errors, it becomes negligible, locally in a small neighborhood of a given time point. As a result, estimating the error distribution shares some common features with nonparametric measurement error problems with repeated measurements studied in Delaigle et al. (2008), and with nonparametric estimation from aggregated data studied by Meister (2007) and Delaigle & Zhou (2015), where the estimation techniques require working in the Fourier domain.

Motivated by this, we propose to estimate the characteristic function of the measurement errors by locally averaging the empirical characteristic functions of the changes in values of the high-frequency data. We obtain a nonparametric estimator of the probability density function of the measurement errors and show that it is consistent and minimax rate optimal. We propose a simple method for consistently estimating the moments of the measurement errors. Using our estimator of the characteristic function of the errors, we develop a new rate-optimal multiscale frequency domain estimator of the integrated volatility of the stochastic process, a key quantity of interest in high-frequency financial data analysis.

In the sequel, for two sequences of real numbers and , we write if there exist positive constants and such that for all . We denote by the convolution of two functions and defined by .

## 2 Methodology

### 2.1 Model and data

We are interested in a continuous-time process observed at high frequency with . We assume that follows a diffusion process

(1) |

where the drift is a locally bounded and progressively measurable process, is a positive and locally bounded Itô semimartingale, and is a standard Brownian motion. The process represents the volatility of the process at time , and is often investigated in its integrated form, , called integrated volatility.

###### Remark 1

The model at (1) is often used when , where denotes the price process of an equity; see for example Zhang et al. (2005). It is also used to model applications in biology, physics and many other fields; see Olhede et al. (2009). All theoretical properties of our work are derived under this model, but the methods derived in Sections 2.2 and 2.4 could be applied to other types of processes , such as the smooth ones typically encountered in the functional data literature. The key property that makes our methods consistent is the continuity of the underlying process , but the convergence rates of our estimators depend on more specific assumptions, such as those implied by the model at (1).

Our data are observed at a generic discrete grid of time points where, without loss of generality, we let and . The observed data are contaminated by additive measurement errors, so that what we observe is a sample , where

(2) |

A conventional assumption when analyzing noisy high-frequency data is that the random measurement error is independent of ; see Xiu (2010), Aït-Sahalia et al. (2010), and Liu & Tang (2014). This is also a standard assumption in the measurement error and the functional data literature; see Carroll & Hall (1988), Stefanski & Carroll (1990) and Wang et al. (2016). Likewise, we make the following assumption: The errors are independently and identically distributed, with unknown density , and are independent of the process .

We are interested in deriving statistical properties of the noise term when is fixed and the frequency of observations increases, that is, when as , where for . Here, the time points ’s do not need to be equispaced. Formally, we make the following assumption: As , is uniformly bounded away from zero and infinity.

Throughout we use to denote the Fourier transform of a function and we make the following assumption on the characteristic function of the errors, which is almost always assumed in the related nonparametric measurement error literature: is real-valued and does not vanish at any point on the real line.

### 2.2 Estimating the error density

Motivated by our discussion in Section 1, we wish to estimate the error density . At a given , if we had access to repeated noisy measurements of , say , for , where the ’s are independent and each , then for we would have . Under Assumption 2.1, using the approach in Delaigle et al. (2008) we could estimate by the square root of the empirical characteristic function of the ’s; then by Fourier inversion we could deduce an estimator of .

However, for high-frequency data, at each given we have access to only one contaminated measurement . Therefore, the above technique cannot be applied. But since is a continuous-time and continuous-path stochastic process, almost surely as . Thus, the collection of observations , where lies in a small neighborhood of , can be approximately viewed as repeated measurements of contaminated by independently and identically distributed errors . As the sampling frequency increases, we have multiple observations in increasingly smaller neighborhoods , which suggests that the density of , for , gets increasingly closer to . Therefore, we can expect that as the sample frequency increases, the approach suggested by Delaigle et al. (2008), applied to the ’s, for and sufficiently close, can provide an increasingly accurate estimator of .

We shall prove in Section 2.3

that this heuristic is correct as long as the

’s and ’s are carefully chosen, which we characterize through a distance . For defined in Section 2.1 and , we define(3) |

and denote by the number of points in . For a fixed , Assumption 2.1 implies that , so that .

Following the discussion above and recalling Assumption 3, for a given we define our estimator of by the square root of the real part of the empirical characteristic function of the difference of nearby ’s:

(4) |

where . Here

can be viewed as a parameter controlling the trade-off between the bias and the variance of

: a smaller results in a smaller bias, but also results in a smaller so that the variance is higher. On the other hand, a larger induces a lower variance, but comes at the price of a larger bias due to the contribution from the dynamics in . The choice of in practice will be discussed in Section 3.1.It follows from the Fourier inversion theorem that , where . We can obtain an estimator of by replacing with in this integral. However, since is an empirical characteristic function, it is unreliable when is large. In order for the integral to exist, needs to be multiplied by a regularizing factor that puts less weight on large . As the sample size increases, becomes more reliable, and this should be reflected by letting the weight depend on the sample size. Using standard kernel smoothing techniques, this can be implemented by taking

where is the Fourier transform of a kernel function , and is a bandwidth parameter that satisfies as . Then, we define our estimator of by

(5) |

For the consistency of a kernel density estimator, the kernel function

needs to be symmetric and integrate to unity. In practice, the estimator is often not very sensitive to the choice of the kernel compared to that of the bandwidth. Popular choices are the Gaussian kernel, i.e. the standard normal density, and the sinc kernel, whose Fourier transform is , where denotes the indicator function. In practice, an advantage of the Gaussian kernel is that it can produce visually attractive smoother estimators than the sinc kernel. The sinc kernel is more advantageous analytically; see Section 2.3.### 2.3 Properties of the density estimator

Assume that the continuous-time process in (1) belongs to the class for some :

and that belongs to the class:

for some constants and . This class is rich; for example it contains the functions that have at least square integrable derivatives. Characterizing error distributions through their Fourier transforms is standard in nonparametric measurement error problems because it is the key to deriving precise asymptotic properties of the estimators.

We consider the sinc kernel introduced below (5). Using this kernel simplifies our presentation of the theoretical derivations from two aspects: its Fourier transform simplifies calculations, and it is a so-called infinite order kernel, which implies that the bias of the resulting nonparametric curve estimators depends only on the smoothness of the target curve. Thus the sinc kernel has adaptive properties and automatically ensures optimal convergence rates. In contrast, the bias of estimators based on finite order kernels, such as the Gaussian kernel, depends on both the order of the kernel and the smoothness of the target curve, which implies that various smoothness subcases need to be considered when deriving properties.

For any two square integrable functions and , let . Proposition 1 gives the convergence rate of , defined at (5), to the true density function .

###### Proposition 1

Proposition 1 shows that the convergence rate of to is affected by , the length of each block , and the bandwidth . The next theorem shows that for appropriate choices of and , the convergence rate achieves .

###### Theorem 1

If the conditions of Proposition 1 hold, and we take and , where and are such that , and , then for some uniform constant only depending on , and ,

We learn from Theorem 1 that as long as , the convergence rate of does not depend on . This is strikingly different from standard nonparametric density estimation problems, where convergence rates typically depend on the smoothness of the target density: the smoother the density, the faster the rates. For example, if we had access to the ’s directly instead of just , then we could apply the technique suggested by Meister (2007) and the convergence rate would increase with . However, in our case, the nuisance contribution by the ’s makes it impossible to reach rates faster than , even if is very large. This is demonstrated in the next theorem, which proves that the rate derived in Theorem 1 is minimax optimal.

###### Theorem 2

Denote by the class of all measurable functionals of the data. Under the conditions in Proposition 1, for some uniform constant only depending on , and ,

### 2.4 Estimating the moments of the microstructure noise

We can deduce estimators of the moments of the microstructure noise from the density estimator derived in Section 2.2, but proceeding that way is unnecessarily complex. Recall from Section 2.2 that, when are close, behaves approximately like , where and

denote generic random variables with the same distribution as, respectively,

and . This suggests that we could estimate the moments of by the empirical moments of , and, from these, deduce estimators of the moments of .For each integer , let and denote the th moment of and , respectively. Since is symmetric, and are equal to zero for all , and we only need to estimate even order moments. For each , we start by estimating by

This is directly connected to our frequency domain analysis: it is easily proved that , where is an estimator of , with at (4).

Exploiting the fact that , we can write , where . It can be deduced from there that

Therefore, we can use an iterative procedure to estimate the ’s. First, for , we take . Then, for , given we take

(6) |

###### Remark 2

The next theorem establishes the convergence rate of . Its proof follows from the convergence rates of the ’s.

###### Theorem 3

Next we derive the asymptotic joint distribution of the proposed moment estimators. Let

be a random vector with a

distribution, where the th element, , of is equal towith . Recalling that and noting that , let

(7) |

The next theorem establishes the asymptotic joint distribution of our moment estimators. It can be used to derive confidence regions for .

### 2.5 Efficient integrated volatility estimation

We have demonstrated that our frequency domain analysis can be used to estimate the error density, which is difficult to estimate in the time domain. Since frequency domain approaches are unusual in high-frequency financial data analysis, a natural question is whether they can lead to an efficient estimator of the integrated volatility , with as in (1). The integrated volatility is a key quantity of interest in high-frequency financial data analysis; it represents the variability of a process over time. It is well known that in cases like ours where the data are observed with microstructure noise, the integrated volatility cannot be estimated using standard procedures, which are dominated by contributions from the noise. There, one way of removing the bias caused by the noise is through multiscale techniques; see Olhede et al. (2009) for a very nice description. Zhang (2006) and Tao et al. (2013) have successfully applied those methods to correct the bias of estimators in the time domain, and Olhede et al. (2009) have proposed a consistent discrete frequency domain estimator in the case where the data are observed at equispaced times. Next we show that these techniques can be applied in our continuous frequency domain context too, even if the observation times are not restricted to be equispaced.

The real part of the empirical characteristic function is such that

(9) |

The second term on the right hand side of (9) contains the integrated volatility, but the first term dominates because its mean is . This suggests that the integrated volatility could be estimated from , if we could eliminate that first term. This can be done by applying, to the frequency domain, the multiscale technique used by Zhang (2006) and Tao et al. (2013). We define a function which combines the empirical characteristic function calculated at different sampling frequencies, in such a way as to eliminate the first term on the right hand side of (9) while keeping the second. For , we define

where, as in Zhang (2006), , , , and where . We could also select , and as in Tao et al. (2013).

The following proposition shows that the real part of , , can be used to approximate the second term of (9).

###### Proposition 2

Since the function depends only on the data, it is completely known. Moreover we have seen in Section 2.4 that we could estimate by . Finally, although the proposition holds for all , the remainders are smaller when is close to one, especially since is more reliable in that case too. Therefore, for close to one, Proposition 2 can be used to compute an estimator of the integrated volatility, . We propose a regression type approach as follows. For some such that is close to one we consider the fixed design regression problem

where represents the regression error and

. Applying a linear regression of

on , we estimate by , the least squares estimator of .## 3 Numerical study

### 3.1 Practical implementation of the density estimator

To compute the density estimator at (5), we need to choose the bandwidth and the parameter . In our problem, doing this is much more complex than that for standard nonparametric density estimators, since, unlike there, we do not have direct access to data from our target . Therefore, we cannot use existing smoothing parameter selection methods, which all require noise free data. Moreover, unlike in standard nonparametric problems, we do not have access to a formula measuring the distance between and its estimator.

Similar difficulties arise in the classical errors-in-variables problem, where one is interested in a density , but observes only data on , where is independent of with known. Delaigle & Hall (2008) proposed to choose the bandwidth using a method called simulation approximation. Instead of computing for they approximate by extrapolating the bandwidths for estimating two other densities, and , that are related to . The rationale of the extrapolation scheme is to exploit the analogous relationships between , , and .

Our problem is different because our estimator of is completely different from that in Delaigle & Hall (2008), and we do not know . Therefore, we cannot apply their method directly, and here we propose a method with the same spirit. In particular, we consider two density functions and , and remark that the way in which and are connected mimics the way in which and are connected. As shown in the Appendix, data from both and can be made accessible, so that one may perform bandwidth selection for estimating them. We then choose the bandwidth for estimating by an extrapolation with a ratio adjustment from the bandwidths for estimating and .

The algorithms for the bandwidth selection are given in the Appendix, and Algorithm Appendix summarizes the main steps. Specifically, for , our procedure requires the construction of variables and times points , which are defined in, respectively, Algorithms Appendix and Appendix in the Appendix. Step (c) of Algorithm Appendix requires choosing values of , say , for , for estimating by , where denotes our density estimator at (5) applied to the ’s.

The idea is that if we knew , we would choose by minimizing the integrated squared error of , i.e. In practice we do not know , but we can construct a relatively accurate estimator of it, the standard kernel density estimator of applied to the data . This is computed by using the Gaussian kernel with bandwidth selected by the method of Sheather & Jones (1991). Using arguments similar to those in Delaigle (2008), under mild conditions, , whereas, with the best possible choice of , , where the rate cannot be improved. Thus, converges to faster than does. This motivates us to approximate defined above by

(10) |

Paralleling the arguments in Delaigle & Hall (2008), it is more important to extrapolate the bandwidth than . Motivated by their results, we take . To choose , let be the standard kernel density estimator with the Gaussian kernel and bandwidth selected by the method of Sheather & Jones (1991), applied to the data , where is chosen at random from . We choose in this way rather than , to prevent accumulated residual effects. Then we take

(11) |

Since and converge faster than and , they can be computed using less data than the latter. Therefore, when the time points are widely unequispaced, for computing the ’s we suggest using only a fraction, say one quarter, of the ’s corresponding to the smallest ’s. That is, we use less but more accurate data for computing the ’s.

Finally, as described in step (d) of Algorithm Appendix, we obtain our bandwidth for estimating by an extrapolation with a ratio adjustment: , and we take . This method is not guaranteed to give the best possible bandwidth for estimating , but is a sensible approximation for a problem that seems otherwise very hard, if not impossible, to solve. Theorem 1 implies that we have a lot of flexibility for choosing , but it is impossible to know if our bandwidth lies in the optimal range without knowing the exact order of and . However, we cannot determine these orders without deriving complex second-order asymptotic results.

### 3.2 Simulations

-, quartiles of the integrated square errors of estimators computed from data from model (i) with

, colomun 1, and with , column 2, and model (ii) with , column 3, when , row 1, and , row 2. The continuous line depicts the true .We applied our methodology to data simulated from stochastic volatility models. We generated the data as in (2). Taking the convention that a financial year has active days, we took with ; which represents one day of financial activity. We took time points every seconds, where , and . Taking the convention of business hours for a trading day, this means that we took the ’s to be equally spaced by , and that was equal to . We generated the microstructure noise according to a normal or a scaled -distribution, and for the ’s, we used the Heston model

where and , , and are parameters. Like Aït-Sahalia & Yu (2009), we set the drift part of to zero. The impact due to the drift function is asymptotically negligible; see, for example, Xiu (2010). We used two models, with values similar to those used by Aït-Sahalia & Yu (2009), which reflect practical scenarios in finance; see also Xiu (2010) and Liu & Tang (2014): (i) and (ii) . In each case we took and considered