Inference for Volatility Functionals of Itô Semimartingales Observed with Noise

by   Richard Y. Chen, et al.
The University of Chicago

This paper presents the nonparametric inference for nonlinear volatility functionals of general multivariate Itô semimartingales, in high-frequency and noisy setting. The estimator achieves the optimal convergence rate after explicit bias correction. A stable central limit theorem is attained with estimable asymptotic covariance matrix.



page 1

page 2

page 3

page 4


Kernel Estimation of Spot Volatility with Microstructure Noise Using Pre-Averaging

We first revisit the problem of kernel estimation of spot volatility in ...

Inference on the maximal rank of time-varying covariance matrices using high-frequency data

We study the rank of the instantaneous or spot covariance matrix Σ_X(t) ...

Optimal estimation of the rough Hurst parameter in additive noise

We estimate the Hurst parameter H ∈ (0,1) of a fractional Brownian motio...

Rates of convergence to the local time of Oscillating and Skew Brownian Motions

In this paper a class of statistics based on high frequency observations...

Nonparametric Estimation and Inference in Psychological and Economic Experiments

The goal of this paper is to provide some statistical tools for nonparam...

Inference and Computation for Sparsely Sampled Random Surfaces

Non-parametric inference for functional data over two-dimensional domain...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This paper concerns the inference of integrated volatility functionals of the form


from high-frequency data modeled by Itô semimartingale observed with noise. Here is positive finite, belongs to the functional space (3.5), each is a positive-definite matrix which is the instantaneous covariance of the continuous part of the Itô semimartingale.

Inferential frameworks of volatility functional estimation, in absence of noise, was established by [1, 2]. Subsequently, specialized methodologies for various applications with novel empirical results blossomed in recent years, for example, [3, 4, 5].

To cope with noise, this paper embeds the pre-averaging method of [6, 7] in the general framework of [1]. In this sense, this work extends the inferential framework to accommodate noisy data, and generalizes the pre-averaging method to nonlinear transformations in the multivariate setting. On the road to a rate-optimal central limit theorem (CLT) with such generality, there are the following technicalities:

  • Stochastic volatility: nonparametric model is used for robustness, yet, it becomes crucial to simultaneously control statistical error (due to noise) and discretization error (attributable to evolving parameters);

  • Jump & Noise: there is an interplay between noise and jump, which necessitates truncating jumps on top of local moving averages, in order to recover volatility from noisy and jumpy observations;

  • Dependence: because of overlapping windows in pre-averaging, the local moving averages are highly correlated to which standard CLTs does not apply, the “big block - small block” technique of [6] is used instead;

  • Bias: generally there is an asymptotic bias due to nonlinearity of in (1.1), in this paper, the bias is explicitly calculated and removed;

  • Exploding derivative

    : some important applications, e.g., precision matrix estimation and linear regression, correspond to a

    with a singularity in derivatives around the origin, a spatial localization argument by [5]111Remark 3.5 in [1] also gives a discussion. is called upon in conjunction with an uniform convergence result.

It is the author’s sincere hope, by solving these technicalities above, this paper will be able to offer a share of contribution to push the inferential framework to an new frontier of potentials and possibilities, and lend the effort to extend the corresponding applications to adopt noisy high-frequency data where exciting new stories await.

2 Setting

2.1 Model

This paper assumes the data is generated from a process , and for any

there is a probability transition kernel

linking another process to where is a solution to the stochastic differential equation


, with and the volatility , is a -dimensional standard Brownian motion, is purely discontinuous process described by (A.1).

In this model, the noisy observations are samples from , and the underlying process before noise contamination is assumed as an Itô semimartingale.

Itô Semimartingale

Noisy Process

Noisy Data


An example of this model is



is a white noise process. Generally, the noise model induced by

incorporates additive white noise, rounding error, the combination thereof as special cases. Besides the probabilistic structure, the inferential framework also requires additional assumptions:

  • the drift is smooth in certain sense;

  • the volatility is a locally spatially restricted Itô semimartingale, e.g., both and is locally bounded;

  • may exhibit infinite activities but has finite variation (or finite-length trajectory);

  • the noise variance is an Itô semimartingale; conditioning on all the information on

    , there is no autocorrelation in noise.

These assumptions are necessary for CLT and applicability over functions of statistical interest. For readers interested in the precise description of the model specification and assumptions, please refer to appendix A.

2.2 Observations

This work treats regularly sampled observations and considers in-fill asymptotics222aka high-frequency asymptotics, fixed-domain asymptotics. Specifically, the samples are observed every time unit on a finite time interval , is the sample size.

Throughout this paper, is written for where can be a process or filtration, for example, denotes the value of volatility at time ; represents the increment where is a process.

2.3 Notations

For , denotes the space of -time continuously differentiable functions over the domain ; denotes the space of positive-definite matrices;

denotes a norm on vectors, matrices or tensors, depending on the context;

means both and are bounded for large ; for a multidimensional array, the entry index is written in the superscript, e.g., denotes the entry in the matrix ; (resp. ) denotes stable convergence of processes (resp. variables) in law333See section 2.2.1, 2.2.2 in [8].; denotes uniform convergence on compact sets;

denotes a mixed Gaussian distribution.

3 Estimation Methodology

The estimation methodology consists of 5 components:

  1. local moving averages of noisy data by a smoothing kernel , which act as proxies for ’s;

  2. jump truncation operated on local moving averages;

  3. spot volatility estimator ’s for estimating ’s;

  4. Riemann sum of ’s for approximating ;

  5. bias correction due to the nonlinearity, e.g., in case of and constant volatility, by Taylor expansion, the estimation error of the plug-in estimator can be decomposed as

    the bias arises from the quadratic form of estimation error of , provided has a non-zero Hessian. This bias term does not affect the consistency, but one needs to explicitly correct the bias to get a CLT.

The moving-average idea is due to [6, 7]; the truncation is modified from (16.4.4) in [8]; the plug-in and bias correction are inspired by [1]. The specific recipe is given next.

3.1 Building blocks

For the local moving averages, we choose a smoothing kernel such that


Choose an integer as the number of observations in each smoothing window, define and . Associate the following quantities with a generic process :


is a local moving average of the noisy data ’s and is a proxy for , serves as noise correction to . Based on these 2 ingredients, choose , define the spot volatility estimator as


where is a truncation threshold for jumps. The choice of is stated in (3.6). A spot noise variance estimator is also needed:


where , positive finite.

3.2 The estimator

Definition 1.

Let , the estimator of is defined as

where is a de-biasing term of the form

is defined in (3.3), is defined in (3.4), is defined in (4.3), and is for finite-sample adjustment.

As it is shown in appendix B, with some proper choice of in (3.3), this estimator is applicable to any function satisfies


where for some , and is identified in assumption 1.

3.3 Tunning parameters

Besides , there are 3 tuning parameters.

scale rate description
length of overlapping window for local moving averages
length of disjoint window for estimating spot volatility
truncation level for jumps

The choice of these tunning parameters is crucial for achieving consistency, CLT, and optimal convergence rate. For these objectives, one needs


are positive finite, and is introduced in assumption 1.

The rest of this section offers an intuition for (3.6). The reader can skip this part without affecting understanding of the main result in section 4.

  1. influences the convergence rate
    In the example (2.2), according to (3.2),

    and we can write . Under the conditional independence of ’s, ; by (B.12). By taking the orders of and are equal, this choice of local smoothing window will deliver the optimal rate of convergence.

  2. dictates bias-correction and the CLT form
    Here let’s focus on the case , is continuous, then

    • is the “discretization error”, by (B.2);

    • is the “statistical error”, where is a continuous Itô semimartingale, this result is due to (3.8) in [6], so .


    order of

    dominated bydiscreticationerror

    dominated bystatistical error


    Balancing the orders of and by setting will result in the minimum order of total estimation error. However, in the case the bias involves volatility of volatility and volatility jump, which are difficult to estimate and subsequently de-bias in applications. Therefore, it is advisable to choose , in which case the statistical error dominates in the bias, thereby the thorny terms are circumvented. Besides, to achieve successful de-biasing of statistical error and negligibility of higher-order Taylor-expansion terms, we need . Section 3.1, 3.2 of [1] give a similar discussion in absence of noise.

  3. disentangles volatility from jump variation
    if there is no jump in the sample path over , according to (B.6). By choosing , the truncation level, which is , keeps the diffusion movements and discards jumps in a certain sense. To effectively filter out the jumps, the truncation level should be bounded above and the upper bounds depends on the jump activity index .


If the reader is interested to estimate with satisfying

the requirements on and can be loosened and become

For wider applicability, we choose to accommodate the functional space (3.5) and retain the requirement (3.6).

4 Asymptotics

4.1 Elements

Before stating the asymptotic result, some elements appear in the limit need to be defined. Associate the following quantities with the smoothing kernel for :


Define , , as -valued functions, such that for , ,


and also as a tensor-valued function


where is introduced in (3.6).

Now we are ready to describe the limit process.

Definition 2.

Given satisfying (3.5), is a process defined on an extension of the probability space specified in (A.4), such that conditioning on is a mean-0 continuous Itô semimartingale with conditional variance

where is the conditional expectation operator on the extended probability space and


with defined in (A.3).

4.2 The formal result


Assume assumptions 1, 2. Given satisfying (3.5), we control the tunning parameters , , according to (3.6), then we have the following stale convergence in law of discretized process to a conditional continuous Itô semimartingale on compact subset of :


where is defined in (1.1), is from definition 1, is identified in definition 2.

The asymptotic result is stated with a probabilistic flavor, which is necessary to express the strongest convergence444It is functional stable convergence (or stable convergence of processes) in law. by appendix B. There is an alternative formulation which is more relevant for statistical applications:


under the same conditions and is a finite constant.

5 Discussions

5.1 Computing confidence intervals

The asymptotic variance in (4.6) can be estimated by plugging in spot estimates (3.3), (3.4):


Under (3.6) where , for all finite ,

5.2 Semi-efficiency

Asymptotic variance reduction is discussed here in restricted settings where and is -valued. It is conjectured that the efficiency bound is based on [9], [10]

. In the parametric model where

, , , by choosing in (3.6) where and are preliminary estimates of and , is a functional of smoothing kernel and , we have . In the nonparametric model where , apply the adaptive enhancement of [10] to spot volatility estimates, is also feasible.

5.3 Positive-definiteness

The spot volatility estimator (3.3) is not guaranteed to be positive definite in finite sample, because of the noise-correction term . Suggested by [11], one can increase to attenuate noise in and dispense with :

where . Doing so sacrifices the convergence rate, which drops from down to . This general inferential framework requires for , hence the convergence rate is less than .

5.4 Examples

As a proof of concept, estimators corresponding to , when are calculated based on simulation of the model

where , , , , , . The results are shown in figure 1.

Figure 1: Simulation of functional estimators

Appendix A Assumptions

This section presents details of model specification and assumptions. First is specification of the purely discontinuous process


where is a -valued predictable function on , is a Polish space, is a Poisson random measure with compensator , is a -finite measure on and has no atom. The volatility process is assumed to be an Itô semimartingale555It is important to accommodate long-memory volatility models, however general volatility functional estimation in long-memory and noisy setting is an open question.


where is -valued, optional, càdlàg; is -valued, adapted, càdlàg; is a -valued predictable function on .

Let a filtered probability space in which , are -adapted; let be another filtered probability space accommodating ; , , let be a conditional probability measure on , in particular, . The conditional noise variance process is defined as


All the stochastic dynamics above can be described on the filtered extension , where


In the sequel, denotes the expectation operator on or ; denotes the conditional expectation operator, with being , , , .

Necessary assumptions are collected below.

Assumption 1 (regularity).

has -Hölder sample path, i.e., ,

is of the form (A.2), there is a sequence of triples , where is a stopping time and ; is convex, compact such that

is a sequence of bounded -integrable functions on , such that

Assumption 2 (noise).




for the same , in assumption 1,

Appendix B Derivation

b.1 Preliminaries

6 useful results will be stated. The constant changes across lines but remains finite, and is a constant depending on .

I. By a localization argument from section 4.4.1 in [8], without loss of generality we can assume a constant K, a bounded -integrable function on , a convex compact subspace and , where denotes the -enlargement of (see (3.5)), such that


II. Define a continuous Itô semimartingale with corresponding parameters being the same as those in (2.1),

Let . Based on (3.2), define

The spot volatility estimator calculated on continuous sample paths is more tractable. In the upcoming derivation, is tightly bounded with a proper choice of , the focus then will be shifted from to .

III. By estimates of Itô semimartingale increments, for any finite stopping time


by Lemma 2.1.7, Corollary 2.1.9 in [8]


where and as .

IV. Let . For a generic process , define


this quantity is useful in analyzing