1 Introduction
This paper concerns the inference of integrated volatility functionals of the form
(1.1) |
from high-frequency data modeled by Itô semimartingale observed with noise. Here is positive finite, belongs to the functional space (3.5), each is a positive-definite matrix which is the instantaneous covariance of the continuous part of the Itô semimartingale.
Inferential frameworks of volatility functional estimation, in absence of noise, was established by [1, 2]. Subsequently, specialized methodologies for various applications with novel empirical results blossomed in recent years, for example, [3, 4, 5].
To cope with noise, this paper embeds the pre-averaging method of [6, 7] in the general framework of [1]. In this sense, this work extends the inferential framework to accommodate noisy data, and generalizes the pre-averaging method to nonlinear transformations in the multivariate setting. On the road to a rate-optimal central limit theorem (CLT) with such generality, there are the following technicalities:
-
Stochastic volatility: nonparametric model is used for robustness, yet, it becomes crucial to simultaneously control statistical error (due to noise) and discretization error (attributable to evolving parameters);
-
Jump & Noise: there is an interplay between noise and jump, which necessitates truncating jumps on top of local moving averages, in order to recover volatility from noisy and jumpy observations;
-
Dependence: because of overlapping windows in pre-averaging, the local moving averages are highly correlated to which standard CLTs does not apply, the “big block - small block” technique of [6] is used instead;
-
Bias: generally there is an asymptotic bias due to nonlinearity of in (1.1), in this paper, the bias is explicitly calculated and removed;
-
Exploding derivative
: some important applications, e.g., precision matrix estimation and linear regression, correspond to a
with a singularity in derivatives around the origin, a spatial localization argument by [5]111Remark 3.5 in [1] also gives a discussion. is called upon in conjunction with an uniform convergence result.
It is the author’s sincere hope, by solving these technicalities above, this paper will be able to offer a share of contribution to push the inferential framework to an new frontier of potentials and possibilities, and lend the effort to extend the corresponding applications to adopt noisy high-frequency data where exciting new stories await.
2 Setting
2.1 Model
This paper assumes the data is generated from a process , and for any
there is a probability transition kernel
linking another process to where is a solution to the stochastic differential equation(2.1) |
, with and the volatility , is a -dimensional standard Brownian motion, is purely discontinuous process described by (A.1).
In this model, the noisy observations are samples from , and the underlying process before noise contamination is assumed as an Itô semimartingale.
An example of this model is
(2.2) |
where
is a white noise process. Generally, the noise model induced by
incorporates additive white noise, rounding error, the combination thereof as special cases. Besides the probabilistic structure, the inferential framework also requires additional assumptions:-
the drift is smooth in certain sense;
-
the volatility is a locally spatially restricted Itô semimartingale, e.g., both and is locally bounded;
-
may exhibit infinite activities but has finite variation (or finite-length trajectory);
-
the noise variance is an Itô semimartingale; conditioning on all the information on
, there is no autocorrelation in noise.
These assumptions are necessary for CLT and applicability over functions of statistical interest. For readers interested in the precise description of the model specification and assumptions, please refer to appendix A.
2.2 Observations
This work treats regularly sampled observations and considers in-fill asymptotics222aka high-frequency asymptotics, fixed-domain asymptotics. Specifically, the samples are observed every time unit on a finite time interval , is the sample size.
Throughout this paper, is written for where can be a process or filtration, for example, denotes the value of volatility at time ; represents the increment where is a process.
2.3 Notations
For , denotes the space of -time continuously differentiable functions over the domain ; denotes the space of positive-definite matrices;
denotes a norm on vectors, matrices or tensors, depending on the context;
means both and are bounded for large ; for a multidimensional array, the entry index is written in the superscript, e.g., denotes the entry in the matrix ; (resp. ) denotes stable convergence of processes (resp. variables) in law333See section 2.2.1, 2.2.2 in [8].; denotes uniform convergence on compact sets;denotes a mixed Gaussian distribution.
3 Estimation Methodology
The estimation methodology consists of 5 components:
-
local moving averages of noisy data by a smoothing kernel , which act as proxies for ’s;
-
jump truncation operated on local moving averages;
-
spot volatility estimator ’s for estimating ’s;
-
Riemann sum of ’s for approximating ;
-
bias correction due to the nonlinearity, e.g., in case of and constant volatility, by Taylor expansion, the estimation error of the plug-in estimator can be decomposed as
the bias arises from the quadratic form of estimation error of , provided has a non-zero Hessian. This bias term does not affect the consistency, but one needs to explicitly correct the bias to get a CLT.
The moving-average idea is due to [6, 7]; the truncation is modified from (16.4.4) in [8]; the plug-in and bias correction are inspired by [1]. The specific recipe is given next.
3.1 Building blocks
For the local moving averages, we choose a smoothing kernel such that
(3.1) |
Choose an integer as the number of observations in each smoothing window, define and . Associate the following quantities with a generic process :
(3.2) |
is a local moving average of the noisy data ’s and is a proxy for , serves as noise correction to . Based on these 2 ingredients, choose , define the spot volatility estimator as
(3.3) |
where is a truncation threshold for jumps. The choice of is stated in (3.6). A spot noise variance estimator is also needed:
(3.4) |
where , positive finite.
3.2 The estimator
Definition 1.
3.3 Tunning parameters
Besides , there are 3 tuning parameters.
scale | rate | description | |
---|---|---|---|
length of overlapping window for local moving averages | |||
length of disjoint window for estimating spot volatility | |||
truncation level for jumps |
The choice of these tunning parameters is crucial for achieving consistency, CLT, and optimal convergence rate. For these objectives, one needs
(3.6) |
are positive finite, and is introduced in assumption 1.
The rest of this section offers an intuition for (3.6). The reader can skip this part without affecting understanding of the main result in section 4.
-
dictates bias-correction and the CLT form
Here let’s focus on the case , is continuous, thenBalancing the orders of and by setting will result in the minimum order of total estimation error. However, in the case the bias involves volatility of volatility and volatility jump, which are difficult to estimate and subsequently de-bias in applications. Therefore, it is advisable to choose , in which case the statistical error dominates in the bias, thereby the thorny terms are circumvented. Besides, to achieve successful de-biasing of statistical error and negligibility of higher-order Taylor-expansion terms, we need . Section 3.1, 3.2 of [1] give a similar discussion in absence of noise.
-
disentangles volatility from jump variation
if there is no jump in the sample path over , according to (B.6). By choosing , the truncation level, which is , keeps the diffusion movements and discards jumps in a certain sense. To effectively filter out the jumps, the truncation level should be bounded above and the upper bounds depends on the jump activity index .
4 Asymptotics
4.1 Elements
Before stating the asymptotic result, some elements appear in the limit need to be defined. Associate the following quantities with the smoothing kernel for :
(4.1) |
Define , , as -valued functions, such that for , ,
(4.2) |
and also as a tensor-valued function
(4.3) |
where is introduced in (3.6).
Now we are ready to describe the limit process.
Definition 2.
Given satisfying (3.5), is a process defined on an extension of the probability space specified in (A.4), such that conditioning on is a mean-0 continuous Itô semimartingale with conditional variance
where is the conditional expectation operator on the extended probability space and
(4.4) |
with defined in (A.3).
4.2 The formal result
Proposition.
Assume assumptions 1, 2. Given satisfying (3.5), we control the tunning parameters , , according to (3.6), then we have the following stale convergence in law of discretized process to a conditional continuous Itô semimartingale on compact subset of :
(4.5) |
where is defined in (1.1), is from definition 1, is identified in definition 2.
The asymptotic result is stated with a probabilistic flavor, which is necessary to express the strongest convergence444It is functional stable convergence (or stable convergence of processes) in law. by appendix B. There is an alternative formulation which is more relevant for statistical applications:
(4.6) |
under the same conditions and is a finite constant.
5 Discussions
5.1 Computing confidence intervals
5.2 Semi-efficiency
Asymptotic variance reduction is discussed here in restricted settings where and is -valued. It is conjectured that the efficiency bound is based on [9], [10]
. In the parametric model where
, , , by choosing in (3.6) where and are preliminary estimates of and , is a functional of smoothing kernel and , we have . In the nonparametric model where , apply the adaptive enhancement of [10] to spot volatility estimates, is also feasible.5.3 Positive-definiteness
The spot volatility estimator (3.3) is not guaranteed to be positive definite in finite sample, because of the noise-correction term . Suggested by [11], one can increase to attenuate noise in and dispense with :
where . Doing so sacrifices the convergence rate, which drops from down to . This general inferential framework requires for , hence the convergence rate is less than .
5.4 Examples
As a proof of concept, estimators corresponding to , when are calculated based on simulation of the model
where , , , , , . The results are shown in figure 1.
![]() |
![]() |
Appendix A Assumptions
This section presents details of model specification and assumptions. First is specification of the purely discontinuous process
(A.1) |
where is a -valued predictable function on , is a Polish space, is a Poisson random measure with compensator , is a -finite measure on and has no atom. The volatility process is assumed to be an Itô semimartingale555It is important to accommodate long-memory volatility models, however general volatility functional estimation in long-memory and noisy setting is an open question.
(A.2) |
where is -valued, optional, càdlàg; is -valued, adapted, càdlàg; is a -valued predictable function on .
Let a filtered probability space in which , are -adapted; let be another filtered probability space accommodating ; , , let be a conditional probability measure on , in particular, . The conditional noise variance process is defined as
(A.3) |
All the stochastic dynamics above can be described on the filtered extension , where
(A.4) |
In the sequel, denotes the expectation operator on or ; denotes the conditional expectation operator, with being , , , .
Necessary assumptions are collected below.
Assumption 1 (regularity).
has -Hölder sample path, i.e., ,
is of the form (A.2), there is a sequence of triples , where is a stopping time and ; is convex, compact such that
is a sequence of bounded -integrable functions on , such that
Assumption 2 (noise).
Appendix B Derivation
b.1 Preliminaries
6 useful results will be stated. The constant changes across lines but remains finite, and is a constant depending on .
I. By a localization argument from section 4.4.1 in [8], without loss of generality we can assume a constant K, a bounded -integrable function on , a convex compact subspace and , where denotes the -enlargement of (see (3.5)), such that
(B.1) |
II. Define a continuous Itô semimartingale with corresponding parameters being the same as those in (2.1),
Let . Based on (3.2), define
The spot volatility estimator calculated on continuous sample paths is more tractable. In the upcoming derivation, is tightly bounded with a proper choice of , the focus then will be shifted from to .
III. By estimates of Itô semimartingale increments, for any finite stopping time
(B.2) |
by Lemma 2.1.7, Corollary 2.1.9 in [8]
(B.3) |
where and as .
IV. Let . For a generic process , define
(B.4) |
this quantity is useful in analyzing