Higher-order Accurate Spectral Density Estimation of Functional Time Series

by   Tingyi Zhu, et al.

Under the frequency domain framework for weakly dependent functional time series, a key element is the spectral density kernel which encapsulates the second-order dynamics of the process. We propose a class of spectral density kernel estimators based on the notion of a flat-top kernel. The new class of estimators employs the inverse Fourier transform of a flat-top function as the weight function employed to smooth the periodogram. It is shown that using a flat-top kernel yields a bias reduction and results in a higher-order accuracy in terms of optimizing the integrated mean square error (IMSE). Notably, the higher-order accuracy of flat-top estimation comes at the sacrifice of the positive semi-definite property. Nevertheless, we show how a flat-top estimator can be modified to become positive semi-definite (even strictly positive definite) in finite samples while retaining its favorable asymptotic properties. In addition, we introduce a data-driven bandwidth selection procedure realized by an automatic inspection of the estimated correlation structure. Our asymptotic results are complemented by a finite-sample simulation where the higher-order accuracy of flat-top estimators is manifested in practice.



There are no comments yet.


page 1

page 2

page 3

page 4


On Construction of Higher Order Kernels Using Fourier Transforms and Covariance Functions

In this paper, we show that a suitably chosen covariance function of a c...

Reconciling the Gaussian and Whittle Likelihood with an application to estimation in the frequency domain

In time series analysis there is an apparent dichotomy between time and ...

Adaptive Kernel Estimation of the Spectral Density with Boundary Kernel Analysis

A hybrid estimator of the log-spectral density of a stationary time seri...

The Weighted Kendall and High-order Kernels for Permutations

We propose new positive definite kernels for permutations. First we intr...

Higher Order Targeted Maximum Likelihood Estimation

Asymptotic efficiency of targeted maximum likelihood estimators (TMLE) o...

Spectral methods for small sample time series: A complete periodogram approach

The periodogram is a widely used tool to analyze second order stationary...

Spectral properties of kernel matrices in the flat limit

Kernel matrices are of central importance to many applied fields. In thi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Functional time series has become a recent focus within the statistical research of functional data analysis due to the fact that functional data are often collected sequentially over time. Typically, we consider a stationary functional sequence whose terms are random elements of the separable Hilbert space . The central issue in the analysis of functional time series is to take into account the temporal dependence between the observations. That amounts to the investigation of second-order characteristics of the functional sequences. A handful of early papers have studied the covariance structure of the functional sequences with dependence. For example, Bosq (2002) [4], Dehling and Sharipov (2005) [8] consider the estimation of covariance operator for functional autoregressive processes, Horváth and Kokoszka (2010) [11] studied the covariance structure of weakly dependent functional time series under an -dependence condition; see also Horváth and Kokoszka (2012) [12] for an overview.

Nevertheless, to obtain a complete description of the second-order structure of dependent functional sequences, one needs to consider autocovariance operators, or autocovariance kernels relating different lags of the series, analogous to the autocovariance matrices in the context of multivariate time series analysis. One statistic of interest associated with the autocovariance operator is the long-run covariance kernel (or long-run covariance function) defined as where for and , is the so-called autocovariance kernel. The analysis of the long-run covariance kernel is applicable to general functional dependence sequences without a particular model assumption.

Horváth et. al (2013) [13] proposed the kernel lag-window estimator of and showed its consistency under mild conditions. The asymptotic normality of the estimator is established in Berkes et. al. (2016) [2]. The estimation has applications in mean and stationarity testing of functional time series, see Horváth et. al (2015) [14] and Jirak (2013) [16]. Horváth et. al (2016) [15] and Rice and Shang (2016) [31] address the bandwidth selection for the kernel of the lag-window estimator.

Rather than focus on the isolated characteristic like the long-run covariance, Panaretos and Tavakoli (2013) [19] approach the problem of inferring the second-order structure of stationary functional time series via Fourier analysis, formulating a frequency domain framework for weakly dependent functional data. In the frequency domain of functional setting, the entire second-order dynamical properties are encoded in the spectral density kernel which is defined as


where the autocovariance kernel and the spectral density kernel comprise a Fourier pair. The notion of spectral density kernel in the functional setting is a generalization of finite-dimensional notion in the context of spectral density analysis of multivariate time series, which has been extensively studied by prominent statistical researchers; see, e.g. Parzen (1957, 1961) [21, 22], Brillinger and Rosenblatt (1967) [6], Hannan (1970) [10] and Priestley (1981) [30]. A consistent estimate of the spectral density kernel in the form of a weighted average of the periodogram kernel—the functional analogous of periodogram matrix—is also proposed in Panaretos and Tavakoli (2013) [19] under a type of cumulant mixing condition. This weak dependence condition is the functional analog of classical cummulant-type mixing condition of Brillinger (2001) [5].

In this paper, we propose a new class of spectral density kernel estimators based on the notion of flat-top kernel defined in Politis (2001) [23]; see also Politis and Romano (1995, 1996, 1999) [26, 27, 28]. The new class of estimators employs the inverse Fourier transform of a flat-top function to construct the weight function smoothing the periodogram. With the choice of a high-order flat-top kernel, it is shown to be able to achieve bias reduction, and hence the higher-order accuracy in terms of optimizing the integrated mean square error (IMSE). It is also nearly equal to the general lag-window type estimators which is a well-know fact in finite-dimensional case; see Brockwell and Richard (2013) [7] and Brillinger (2001) [5].

The higher-order accuracy of flat-top estimation typically comes at the sacrifice of the positive semi-definite property. To address this issue, we show how a flat-top estimator can be modified to become positive semi-definite (even strictly positive definite) while retaining the favorable asymptotic properties. The modification is similar to the one proposed in Politis (2011) [25], for the treatment of flat-top spectral density matrix estimators. In addition, we introduce a data driven bandwidth selection procedure realized by an automatic inspection of the correlation structure.

The structure of the paper is as follows. In the next section, the flat-top estimator of the spectral density kernel is defined after introduction of some basic definitions of the frequency domain framework, and theorems on the asymptotic accuracy are given. Section 3 shows the almost equivalence of the proposed estimator in the form of weighted average of periodogram and the flat-top lag-window estimator. A modification of the flat-top spectral density estimator is introduced in Section 4 which results into an estimator that is positive semi-definite while retaining the estimator’s higher-order accuracy. Section 5 addresses the issue of data-dependent bandwidth choice where an empirical rule of picking bandwidth is proposed. Our favorable asymptotic results are supported by a finite-sample simulation in Section 7 where higher-order accuracy of the flat-top estimators are manifested in practice. Finally, the technical proofs are gathered in the Appendix in Section 7.

2 Spectral density kernel estimation

We consider a functional time series where each belongs to the separable Hilbert space possessing mean zero, i.e., for all , and autocovariance kernel for and .

The space is equipped with the inner product and the induced norm ,

We assume the series is strictly stationary in the sense that for any finite set of indices and any , the joint law of is identical to that of .

In addition, the weak dependence structure among the observations is quantified by employing the notion of cumulant kernel of a series. The pointwise definition of a th order cumulant kernel is

where the sum extends over all unordered partitions of . We will make use of the following cumulant mixing condition, defined for fixed and

Condition . For each

We inherit the frequency domain framework of functional time series developed in Panaretos and Tavakoli (2013) [19], in which the functional version of discrete Fourier transform is introduced. Given a functional sequence of length , , the functional Discrete Fourier Transform (fDFT) is defined as


The tensor products of the fDFT leads to the notion of

periodogram kernel–the functional analogue of the periodogram matrix in the multivariate case. The periodogram kernel is defined as


The periodogram kernel is asymptotically unbiased under certain cumulant mixing conditions. However, it is not a consistent estimator of the spectral density kernel as its asymptotic covariance is not zero. Panaretos and Tavakoli (2013) [19] proposed a consistent estimator by convolving the periodogram kernel with a weight function, which has the form


The weight function, , is constructed as


where is a sequence of scale parameters with the properties and as . is a fixed function satisfying that is a positive, even function and

The summation over in (5) makes the weight function periodic with period . The same will be true for the estimator by its definition in (4). With the above constraints imposed on function , it has been shown in Panaretos and Tavakoli (2013) [19] that is a consistent estimator of the spectral density kernel in mean square (with respect to Hilbert-Schmidt norm). The bias of is partly attributed to the assumption that is positive and it can potentially be significantly reduced if an appropriate function is chosen that is not restricted to be positive. To this aim, we propose a class of higher-order accurate estimator by making use of the so-called flat-top kernels in the construction of weight function . The resulting estimator is shown to achieve bias reduction while retaining the asymptotic covariance structure of in (4).

To describe our estimator, we need the notion of a “flat-top” kernel. A general flat-top kernel is defined in terms of its Fourier transform , which is in turn defined as


where is a parameter and is a symmetric function, continuous at all but a finite number of points satisfying and . The flat-top kernel is then given by the inverse Fourier transform of


Note that in the preceding definition, the function , and hence the kernel , depend on the function and the parameter , but this dependence will not be explicitly denoted.

The function is ‘flat’, i.e., constant, over the region , hence the name flat-top for the kernel function . If a kernel function has finite

th moment, and its moments up to order

are equal to zero, i.e. , and for all , then the kernel is said to be of order . We have the following property concerning the order of the kernel function :

Proposition 2.1.

If is p times differentiable flat-top function and is Hölder continuous of order , then is a kernel of order .


See Appendix. ∎

In the following, we will be using to denote the flat-top estimator employing the flat-top kernel , which is in turn induced by a flat-top function . The estimator is of the same form as (4), except the difference in the weight function, i.e.,




with being the flat-top kernel induced by a flat-top function defined in (7).

The following theorems investigate the performance of employing the general flat-top kernel .

Theorem 2.1.

Provided that and as , and assume is the maximum value that can be attained such that C(p,2) holds; then by choosing an appropriate flat-top kernel of order , we have

where the equality holds in , and the error terms are uniform in .


See Appendix. ∎

Remark 2.1.

According to Proposition 2.1, a sufficient condition for a kernel to be of order is that is a times differentiable flat-top function and is Hölder continuous. On the other hand, the decay rate of bias crucially depends on the cumulant mixing condition satisfied by the functional sequence. A type of moment condition is provided in Panaretos and Tavakoli (2013) [19], which is sufficient for the cumulant mixing condition to hold for a general linear process of the form ; see Proposition 4.1 therein.

Remark 2.2.

It is worth mentioning that Theorem 2.1-2.3 can hold for a non-flat-top kernel of order as long as it satisfies properties (i)-(iv) in Lemma 7.2. Nevertheless, in the paper we will be focusing on the flat-top kernels for the simplicity of the proof. In addition, an empirical data-driven bandwidth selection rule is proposed in Section 6, which can be applied exclusively on flat-top kernels.

The flat-top estimator achieves bias improvements while retaining the rate of decay of the covariance structure as stated in the following theorem:

Theorem 2.2.

Under C(1,2) and C(1,4),

where the equality holds in , uniformly in the ’s.

Using our Lemmas 7.2 and 7.3, Theorem 2.2 can be proved along the same lines of proof of Corollary 3.3 in Panaretos and Tavakoli (2013) [19]. For fixed , the covariance can be shown to have a sharper bound ; see Proposition 3.4 in Panaretos and Tavakoli (2013) [19].

Concerning the mean square error, we need the notion of spectral density operator of functional time series, which is introduced in Panaretos and Tavakoli (2013)[19]. The spectral density operator is an operator induced by the spectral density kernel through right integration,


where is the autocovariance operator induced by the autocovariance kernel through right integration,


The spectral density operator is the integral operator with kernel . Analogously, we denote the operator induced by the the kernel through right integration, and thereby the estimator of

. Combining the results on the asymptotic bias and variance of the spectral density operator, we have the following consistency in

integrated mean square of the induced estimator for the spectral density operator .

Theorem 2.3.

Provided assumptions C(p,2) and C(1,4) hold, , , then the spectral density operator estimator employing a flat-top kernel of order is consistent in integrated mean square, that is,

where is the Hilbert-Schmidt norm. More precisely, as .

Theorem 2.3 gives the rate of convergence of to . In the meantime, it also suggests the optimal value of the bandwidth parameter in terms of optimizing the decay rate of integrated mean square error. Apparently, the optimal depends on the cumulant condition a functional sequence possesses, that is, the value of . For any finite , the optimal decay rate can be achieved with . In the case that , one can choose to obtain a favorable rate of .

3 Alternate estimates and flat-top kernel choice

3.1 Alternate estimates

The spectral density kernel estimator considered in the previous section has the form of a weighted average of periodogram ordinates. In fact, the weight function in the estimate (8) has an alternate form


The equivalence of (9) and (12) can be easily verified by using Poisson summation formula. Moreover, if the discrete average in (8) is replaced by a continuous one, the estimate becomes


By Equation (2) and (3), the periodogram kernel is given by



is the sample autocovariance kernel. If this is substituted into (13), then the estimate takes the form



With a flat-top function in place, the estimate (15) has a formal resemblance to the flat-top estimation for multivariate spectrum explored in Politis (2011) [25] with the bandwidth parameter . The estimator has been shown to achieve higher-order accuracy in estimating the spectral density matrix; see Politis (2011) [25] for details. In fact, the estimate (15) is the general form of spectral estimation that has been extensively investigated by prominent statistical researchers as early as 1950s and 1960s. See, e.g., Grenander (1951) [9], Parzen (1957) [21] and Priestley (1962) [29].

3.2 Flat-top kernel choice

As suggested by Theorem 2.1, in order to achieve favorable asymptotic rates, it is desirable to choose a flat-top kernel of higher order, and hence a flat-top function as smooth as possible according to Proposition 2.1. McMurry and Politis (2004) [18] constructed a member of the flat-top family that is infinitely differentiable, which is defined as


where determines the region over which is identically 1, and is a shape parameter, making the transition from to more or less abrupt.

The function connects the regions where is 0 and the region where is 1 in a manner such that is infinitely differentiable for all , including where and . The resulting kernel is of infinite order in the sense that decays faster than , for all positive finite , as . Fig.1 shows the plots of the infinitely differentiable flat-top function with and , and the resulting kernel as well as the corresponding weight function . Note that the plot of can be created from either Equation (9) or (12) as their equivalence stated in Section 3.1.

Figure 1: (a) Plot of ; (b) Plot of corresponding kernel induced by inverse Fourier transform of ; (c) Plot of the corresponding weight function with .

Nevertheless, while the effectiveness of the flat-top kernels is reflected in Theorem 2.1 and 2.3, they in fact provide merely theoretical bounds for the decay rate of bias and IMSE. In the meantime, according to Theorem 2.1, the reduction of bias of the flat-top estimation could potentially be limited by the order of the cumulant condition, which indicates that an infinite-order kernel might not be necessary. That leads us to attempt other choices within the flat-top family. One simple representative flat-top function has the trapezoidal shape defined as


The trapezoidal is continuous everywhere and it already exhibits good performance when being implemented for the estimation of spectral density matrix; see Politis (2011) [25]. The infinitely differentiable function looks very much like the trapezoidal with ultra-smoothed corners.

Another choice to be considered is the flat-top function created by adding a piecewise cubic tail, similar to that of Parzen’s (1961) [22] kernel, to the flat-top region. It is defined as


Plots of flat-top functions and are shown in Figure 2. Concerning the choice of parameters of flat-top kernels, i.e., and , we refer the readers to Politis (2011) [25] where a detailed discussion is given.

Figure 2: (a) Plot of trapezoidal ; (b) Plot of flat-top Parzen .

4 Positive semi-definite spectral estimation

By employing the infinite-order flat-top kernels, the flat-top estimator is capable of achieving higher-order accuracy with improved estimation bias. The disadvantage of flat-top kernels, however, is that they are not positive semi-definite. As a result, the operator estimation is not almost surely positive semi-definite for all , while it converges to a positive semi-definite operator .

The positive semi-definiteness of the estimation is desirable especially in the case of

when the object is estimation of a long-run covariance operator. In the context of finite-dimensional time series analysis, the spectral density matrix estimators can be easily adjusted to be positive semi-definite via replacing negative eigenvalues by zeros in the diagonalization of the estimated matrices; see e.g. Politis (2011)

[25]. Analogously, we now show how the flat-top operator estimator can be modified to render a positive semi-definite estimator while preserving the asymptotic consistency.

The spectral decomposition of operators in an infinite-dimensional Hilbert space is much more intricate than that in a finite-dimensional context. However, recall that both operators and are induced by kernel functions through right integration, and therefore they are symmetric Hilbert-Schmidt operators that admit the following decompositions


where and are two sequences of real numbers tending to zero; and are two orthonormal bases of . We have for ,

thus and , are complete sequences of eigenelements of and respectively.

Noting that the eigenvalues are all non-negative since the operator is positive semi-definite. To fix the possible negativity of , let for all , and define the estimator


We keep nonnegative eigenvalues of and replace negative eigenvalues by zero, which makes the resulting operator an positive semi-definite estimator. The connection of and is shown in the following inequality:

Proposition 4.1.

Let be the positive semi-definite operator estimator of defined in (9), then for a fixed


where is the Hilbert-Schmidt norm.


See Appendix. ∎

A direct consequence of the last result is the following corollary which shows that, in addition to being positive semi-definite, possesses the same mean square convergence of given in Theorem 2.3.

Theorem 4.1.

Under the condition of Theorem 2.3, the positive semi-definite spectral density operator estimate employing a flat-top kernel of order is consistent in integrated mean square with

where is the Hilbert-Schmidt norm.

In the case that the estimand is not only positives semi-definite but strictly positive definite, it is desirable to have a strictly positive definite estimator of . A similar modification of can be applied here to make the estimator strictly positive definite. Let for all , where is some chosen sequence, and define the estimator


The estimator is positive definite and it can be verified that it maintains the high accuracy of the flat-top estimator if . Thus, is a higher-order accurate, strictly positive definite estimator.

5 Data-dependent bandwidth choice

As it has been demonstrated in Section 3.1 that the lag-window estimate (15) with bandwidth is nearly equal to the estimate (8) with bandwidths , we propose here an empirical rule for choosing the bandwidth in practice, which resembles the bandwidth choosing rule for the flat-top lag-window introduced in Politis (2011) [25].

Recall that the sample autocovariance kernel

the proposed bandwidth choice rule is done by a simple inspection of the functional version of correlogram/cross-correlogram, i.e. a plot of vs. where

for all .

We look for a point, say , after which the correlogram for each pair of appears negligible, i.e. for , and . Here is taken to mean that is not taken significantly different from 0. In practice, we determine by considering the correlogram for over a finite grid of . After identifying , the recommendation is to take


where is the parameter determines the ‘flat-top’ region of .

From the flat-top lag-window perspective, the intuition behind the above bandwidth choice rule is an effort to extend the ‘flat-top’ region of over the whole of the region where is thought to be significant so as not to downweigh it and introduce bias. As scrutinized in Politis (2011) [25], the ‘flat-top’ region of can be greater than depending on the choice of function . The decreasing rate of near could be slow enough so that for an interval much greater than ; see, for example, (16) and Figure 1(a) regarding the infinitely differentiable with and . Instead of the interval , we consider an ‘effective’ flat-top region of defined as the interval where is the largest number such that for all in ; here is some small number chosen number, e.g. .

Let be a finite grid of . Now we can formalize the empirical rule of choosing bandwidth .


For , let be the smallest nonnegative integer such that , for , where is a fixed constant, and is a positive, nondecreasing integer-valued function of such that . Then, let , and .

The constant and the form of are the practitioner’s choice. Politis (2003) [24] makes the concrete recommendations and

that have the interpretation of yielding (approximately) 95% simultaneous confidence intervals for

with by Bonferroni’s inequality.

It is also worth noting that by considering the correlogram over the finite grid , we actually generate a matrix of thresholds and, is picked as the maximum among the entries of the matrix, i.e, for . This rule of identifying can be blemished in the situation that certain are radically greater compared to the others. Picking to be the average of the matrix entries will be a more reasonable choice when such a special case arises. Nevertheless, if the target is to estimate the spectral kernel for a particular pair , one can always choose the bandwidth by using the specified , i.e. .

6 Simulations

We now present some numerical simulations to complement our asymptotic results. The main goal of the simulations is to compare the performance of the estimators employing flat-top kernels with that of the non-flat-top estimation, as well as to illustrate the main issues discussed in the paper. The simulations are performed on a simple functional moving average model


The simulations we carry out are analogous to that conducted in Panaretos and Tavakoli (2013) [19]. The innovation functions ’s are independent Wiener processes on , which are represented using a truncated Karhuen-Loève expansion,

where ,

are independent standard Gaussian random variables and

is orthonormal system in ; see Adler (1990) [1]. The operators and are constructed so that their image be contained within a 50-dimensional subspace of , spanned by an orthonormal basis . Representing in the basis, and in the basis, we have the matrix representation of the process as , where is a matrix, each is a matrix, and each is a matrix.

A stretch of is generated for with . Matrices are constructed as random Gaussian matrices with independent entries, such that element in th row are distributed.

For the simulation, simulation runs are generated for each which are used to compute the IMSE by approximating the integral

by a weighted sum over the finite grid . We consider the estimators with proposed flat-top kernels and compare them with the Epanechnikov kernel, , which is non-flat-top implemented in the simulations of Panaretos and Tavakoli (2013) [19]. We apply bandwidth for the estimator of each kernel. In addition, the bandwidths of the estimators employing flat-top kernels are also estimated using the empirical rule proposed in Section 5.

The simulation results are presented in Table 1, entries of which are logarithm of IMSE in base 2. As expected, the estimators employing flat-top kernels show a faster decay rate of IMSE compared to the one with the non-flat-top Epanechnikov kernel. The performance of flat-top Parzen’s kernel and flat-top infinitely differentiable kernel is close, while each slightly outperforms the trapezoid as the sample size grows. This might be due to the fact that the smoothness of the flat-top functions is indeed a factor on the decay of IMSE, but over-smoothing might not be necessary as the performance could potentially be limited by the order of cumulant conditions as Theorem 2.3 suggests. Also note that implementing the empirical rule of bandwidth choice yields a slight improvement as the sample size grows.

64 128 256 512 1024 2048
Epanechnikov kernel -6.188 -6.852 -7.588 -8.276 -9.082 -9.816
(Trapezoid) -6.389 -7.112 -7.902 -8.719 -9.493 -10.146
(flat-top Parzen) -6.383 -7.041 -8.018 -8.846 -9.710 -10.453
(flat-top Inf. Diff.) -6.344 -7.262 -8.074 -8.832 -9.719 -10.470
(a) Bandwidth
64 128 256 512 1024 2048
Epanechnikov kernel -7.148 -7.668 -8.577 -9.075 -9.950 -10.887
(Trapezoid) -7.321 -7.808 -8.789 -9.297 -10.204 -11.193
(flat-top Parzen) -7.343 -8.028 -8.881 -9.562 -10.306 -11.426
(flat-top Inf. Diff.) -7.241 -8.216 -8.911 -9.546 -10.332 -11.412
(b) Bandwidth
64 128 256 512 1024 2048
(Trapezoid) -6.519 -7.331 -8.260 -9.145 -10.089 -11.008
(flat-top Parzen) -6.627 -7.592 -8.455 -9.349 -10.313 -11.371
(flat-top Inf. Diff.) -6.118 -6.925 -8.053 -9.056 -10.214 -11.121
(c) Empirical rule of choosing
Table 1: Entries represent the logarithm of IMSEs in base 2 of different estimators using (a) bandwidth , (b) bandwidth and (c) empirical rule of choosing . Sample size ranges from to . Minimum IMSE for each is indicated by boldface.

7 Appendix: Proofs

7.1 Proof of Proposition 2.1

To prove Proposition 2.1, we need the following lemma, which measures the magnitude of Fourier coefficient.

Lemma 7.1.

Let be an integrable function on the interval , and be its Fourier coefficients defined by


If is times differentiable and is Hölder continuous of order , then


By repeated integration by parts on Equation (26), we have


On the other hand, ; by a change of variable, can be written as

By Hölder continuity of , we have


for some constant . Combining (27) and (28), we obtain


Proof of Proposition 2.1. For a flat-top function and its inverse Fourier transform , we have


Since the assumption that is times differentiable and is Hölder continuous of order , by Lemma 7.1, we have for some constant , which implies has finite moments up to order , i.e. for .

By repeated differentiations on both sides of (31), we obtain for

by dominated convergence theorem. Now that is flat-top, is zero for all , which in turn leads to


if we set on both sides of (32). Therefore, is a kernel of order .

7.2 Proof of Theorem 2.1

To prove Theorem 2.1, the following lemmas are necessary.

Lemma 7.2.

We have the following properties for the flap-top kernel and the function :



Let , and denoted by the total variation of a function .

(iii) If , ;

(iv) If ,


The statement follows directly from setting on both sides of Equation (31). The statement is obtained by following the same arguments in the proof of Lemma F.11 in Panaretos and Tavakoli (2013) [19]. For the third statement, recall that

If , then for