Time series contain dynamical information of a system under observation, and their ubiquity is well-known . A key task in understanding and forecasting such a system is to quantify the dynamics of an associated time series according to a chosen model, a task made challenging by the fact that often the system is nonstationary. Although there is no universal consensus on how to model and analyze time series extracted from nonstationary systems, two common schools of thought are those of time series analysis [24, 7, 17] and time-frequency (TF) analysis [13, 19]. Roughly stated, the main difference between these two paradigms is the assumptions they make on the underlying random process modelling a time series. In classical time series analysis, this random process is typically assumed to have zero first-order statistics. The focus is then on analyzing the second-order statistics, mainly for the purpose of forecasting. Seasonality of a time series (that is, an oscillatory pattern of known periodicity in its mean) is removed by differencing the sequence 
. When a time series is modelled as a sum of a parametric periodic mean function and a stationary noise sequence, there is a small body of statistics literature on methods and algorithms to estimate its periodicity when unknown. In a pioneering work on statistical inference of unknown periodicity, Fisher proposed a maximum periodogram test, which was later investigated and extended [25, 33, 47, 10, 36]. Parametric and non-paramtric approaches have found applications in modelling the light curves of variable stars with chirp behavior in their frequencies  and possible oscillatory patterns [40, 6], and a more general combination of all of the above has also been considered . However, it seems that in the time series literature, little attention has been paid to the possibility of complex seasonality with time-varying amplitude and frequency. To capture nonstationarity, the random process can be modelled to be locally stationary [50, 12], piecewise locally stationary [59, 60], or to satisfy a time-varying autoregressive (AR) model , for example. It is interesting that one of major driving forces in the recent surge in nonstationary time series analysis lies in modelling such series via various evolutionary spectral decompositions of the underlying covariance structure, which shares a similar flavour to that of TF analysis. That is, one seeks to model general classes of nonstationary time series by various time-varying Fourier or wavelet representations of the covariance. See for instance  and  for an evolutionary Fourier decomposition or Cramer representation approach, and  for a method based on time-varying wavelet spectra. Among others,  and  provide contributions in estimating algorithms. The common ground with TF analysis originates in investigating “local spectral behavior” [46, 37], a generalization of the idea of using the spectrum to capture local behavior by manipulating the covariance function. Particularly, when the signal is oscillatory, the common interest is to explore how the oscillation dynamically behaves. This problem has a long history, beginning with the consideration of time-varying frequency and amplitude modulation [20, 44] and more recently progressing to nonparametric, nonsinusoidal, time-varying oscillatory patterns [54, 34]. An important application is found in physiological time series data; for example, electrocardiograms, photoplethysmography, and respiratory signals. Such signals are commonly composed of several oscillatory components with time-varying properties – see Figure 1
for a simulated example. Much effort has been invested over recent decades to understand the complicated dynamics of such signals, a major obstacle being that it is difficult to write down parametric models for them. This motivates the consideration of nonparametric random process as models for the signal. Theadaptive harmonic model (AHM) is used when the oscillatory pattern of the mean is sinusoidal, and the adaptive non-harmonic model
(ANHM) otherwise. In the latter, the mean is a summation of finite oscillations (each with time-varying frequencies or amplitudes, perhaps oscillatory), and, unlike in the time series approach, the second order statistics are mainly considered noise. In the TF approach, available algorithms are roughly classified into linear-type, bilinear-type and nonlinear-type. The synchrosqueezing transform (SST), a nonlinear-type TF tool based on the AHM/ANHM model, was developed in the past decade to estimate the time-varying frequency, amplitude, oscillatory patterns, possible trends, and noise. Broadly, the SST is a nonlinear modification of the commonly used spectrogram, deviating from the short-time Fourier transform (STFT) and Gabor transform by taking the phase information of the STFT into account, and can be viewed as a special case of the reassignment technique pioneered in and further explored in . See Figure 1 for an illustration of the SST when applied to a noisy signal. Compared with the ground truth shown in the left middle bottom subplot, the time-varying frequency of each component is clearly captured by the SST in the signal’s TFR. This demonstrates the capability of the SST to extract dynamical information even in significantly noisy signals. The SST encodes the spirit of empirical mode decomposition , and since its development has seen diverse applications and several variations. For example, taking the S-transform  or wave packets  into account, considering higher order phase information , and applying multi-taper techniques [56, 15] have contributed to further stabilizing the time-frequency representation (TFR) obtained from the SST. Recently the theoretical analysis of SST under the AHM or ANHM when noise does not exist has been well established [14, 9] and widely applied to different problems (for a brief summary of applications, see ). However, when noise (or any stochastic process) is present, the exploration has been limited to asymptotic expansion [51, 9, 15, 58]
. Specifically, even in the null case, the distribution of the SST applied to Gaussian white noise is unknown. Figure1 displays its TFR structure, which exhibits a curious “texture”. Note that this texture structure also exists in the background of the TFR of a non-null signal shown in the right middle top subplot in Figure 1, so there is reason to be be cautious about it causing misinterpretation in scientific exploration. The main focus of this work is to provide a systematic statistical analysis of the SST; specifically, we write down the precise distribution associated with the SST of a stationary white Gaussian random process, in both null and non-null cases. The first challenge encountered along the way is dealing with improper multivariable complex random variables and their ratio distributions. While proper (or, “circular”) complex random variables have been widely discussed in signal processing literature , their improper counterparts and corresponding ratios have been mostly ignored except for [44, 49]
. In our case, impropriety arises naturally from the phase information encoded in the short-time Fourier transform (STFT), and handling its effect on the quotient structure forms the first part of the paper, and is of its own interest for other applications. The second challenge is handling the nonlinear reassignment of STFT coefficients according to the reassignment rule. This step has an interesting interpretation and helps us to connect time series and TF analysis; the big picture is that the reassignment rule has a natural interpretation within the kernel regression framework of time series analysis. The critical quantity is the “effective sampling rate” – when this is correctly specified, we show that in the null case the SST of a stationary white Gaussian random process follows a complex normal distribution. As a result, the SST may be thought of as a bridge between time series and TF analysis.
1.1. Organization of the paper
, we present some tools for handling complex Gaussian random vectors. These are of interest aside from their use in the sequel, so we isolate them into an independent section. Section4 marries the two previous ones by describing the instantaneous frequency distribution and its first and second order statistics. The SST integrand is then approximated in a suitable sense by a truncated version of itself, which lends itself more handily to the proof of a central limit theorem (CLT)-style theorem for the SST; namely, Theorem 4.14. We conclude with an insight from the analysis shown in Section 5, a numerical simulation in Section 6, and a discussion in Section 7. Unless otherwise stated, all proofs are relegated to the supplementary material.
|Conjugate, transpose, and complex conjugate of a matrix , respectively|
|The augmented vector of the vector|
|The augmented matrix with block structure , respectively|
|Euclidean norm on vectors, Frobenius norm on matrices|
|Hermite function of negative order|
|Confluent hypergeometric function as defined in [32, p. 239, (9.1.4)]|
|The class of -valued Schwartz functions defined on|
|,||Real and imaginary part operators, respectively|
|Fourier transform; unitary version with kernel .|
|The composition , where and|
|Short-time Fourier transform of , with window|
|Reassignment rule of with window|
|A variation of convenient for proof|
|STFT-based synchrosqueezing transform of with window and bandwidth|
|-dependent truncated kernel|
2. A summary of the SST algorithm
Take a Schwartz function . For a tempered distribution , the windowed or short time Fourier transform (STFT) of associated with the window function is defined by the equation
where is the time and is the frequency and . This is called the modified STFT, and it is a modification of the ordinary STFT by the phase modulation . When is represented by a function, we may abuse notation in the usual way and write
Commonly, the window function is chosen to be a Gaussian function with mean and bandwidth ; in this work we will later take for simplicity. Given the above, the STFT-based synchrosqueezing transform (SST) of with the modified window function with resolution is defined to be
where the reassignment rule is defined by
and is an approximate -distribution on when is sufficiently small; that is, tends weakly to the -distribution centred at zero as . For concreteness, we will take , which has the norm . Notice that the nonlinearity of the SST over signals arises from the dependence of equation (3) on the reassignment rule, which provides information about the instantaneous frequency of the signal (as made precise in ). The goal of this paper is to initiate the study of the distribution of for and , where is a deterministic tempered distribution and is a generalized random process (GRP); see B for a summary of these objects. In this case, to understand the statistics of , we are led to consider the distribution of the ratio of the random variables
We work under the assumptions that the noise is mean-zero and stationary, which lets us define by
By a direct expansion, the reassignment rule takes the form
where . If the noise is such that is zero only on a set of measure zero, we may add and subtract in the numerator of (8) to obtain the almost-sure equality
The two cases of interest we treat are the so-called null case where , and the nonnull case, where is not identically zero. In the former we have , and (9) takes the simple form
The analysis in each case depends on understanding the random variables at hand, which we address now.
3. Complex Gaussians and their quotients
Suppose the random vector can be written in the form for some real-valued random vectors and . The density of is then defined to be the density of ; that is, .
Definition 3.1 (Complex Gaussian distribution ).
Let . Suppose are Hermitian positive-definite and complex symmetric, respectively, and the Hermitian matrix is positive definite. We write and say the random vector follows a complex Gaussian distribution with mean , covariance , and pseudocovariance if
where is a column vector and is the augmented covariance matrix. is said to be proper if , and improper otherwise.
Note that while real Gaussian vectors are completely characterized by their mean and covariance, complex Gaussian vectors are characterized by their mean and augmented covariance, as is clearly seen by the structure of the matrix . Note that if a complex Gaussian vector has uncorrelated components (that is, diagonal ), it does not necessarily follow that these components are independent, as may be nonzero. When is proper, commutativity of matrix inversion with conjugation and positive-definiteness of gives us
If is also diagonal, the real and imaginary parts of are seen by direct calculation to be independent and random variables, respectively. Note that positive-definiteness of guarantees the invertibility of , and hence by [5, Proposition 2.8.3] we have , where the so-called Schur complement, is also invertible. By block matrix inversion [5, (2.8.16), (2.8.18) and (2.8.20)], we have
where . Observe that since is positive definite, is too. Moreover, by a direct validation, we know is also symmetric. Indeed, by viewing as a multiplication of block matrices, we have and , which leads to and . Since is Hermitian, we have , and hence we immediately have the claim by .
3.1. Complex Gaussian quotient density
If the complex random vector , then the density of has a simple closed-form determined in . To extend this result to the most general case, we recall from [32, p. 290, (10.5.2)] and [43, (4)] that the Hermite function of order satisfies for all the identities
With these in mind, we have the following new result:
Theorem 3.1 (Complex Gaussian quotient density).
If , then for any the density of the random variable is given by
When and , we write instead of . In particular,
The proof of Theorem 3.1 is given in Appendix D.1, and the symbols and originating from the existence of the impropriety are introduced to keep the formula (16) succinct. Notice that vanishes when the mean is zero. Moreover, does not blow up at because for all . On the other hand, decays like as . We now discuss a point of potential confusion in the literature. It is well-known that the quotient of two independent real
Gaussian random variables follows a Cauchy distribution, and the density of a quotient of dependent nonstandard real Gaussians is given explicitly in[43, Theorem 2]. One might expect that an extension to the complex situation would bring truth to the statement that “a quotient of complex Gaussians should have a complex Cauchy distribution”. Unfortunately, this is not the case: a distribution by the name of complex Cauchy has already appeared in the literature [49, p. 46, (2.80) with ], its density being given by
for some location parameter and so-called scatter (or dispersion) matrix . This does not coincide with the distribution of the quotient of two complex Gaussians; for example, the former does not have a mean , whereas the latter does, as we show now.
Let and . Then
and are finite when , and infinite when .
When and , we have .
When is diagonal and , we have . 11todo: 1U and L should be defined here even though it’s messy.
HTW: can be removed?
4. Statistical analysis of SST
Definition 4.1 (Complex Gaussian white noise).
A stationary GRP is a complex Gaussian white noise if for any finite collection we have , where and for all . We denote the measure associated with by ; see B.
Note that the matrices above need not be diagonal. A typical example of complex Gaussian white noise is , where be a standard complex Brownian motion, the differentiation operator on GRPs, and is the weak derivative of . In this paper, we are concerned with when is a complex Gaussian white noise, which involves studying complex random vectors of the form .
When the window has the property that is real-valued and even, we have , where
for and .
See Section B for a proof. Note that we do not have in the subscript, since is independent of time. Thinking of the case of real Gaussians, it is natural to ask if there is some choice of basis that would let us diagonalize both the matrices, and , in Lemma 4.2. By elementary linear algebra, two diagonalizable matrices are simultaneously diagonalizable if and only if they commute, and it can be checked that and do not commute except when is a scalar multiple of the identity. Note that if decays exponentially when , for any , also decays exponentially when . Therefore, decays to exponentially fast when . When is Gaussian, we can further simplify and , so we hereafter assume the following:
The window is given by .
As a direct consequence of Lemma 4.2, we obtain the following corollary by utilizing the the exponential decay of the pseudocovariance part and noting that .
Under Assumption 4.1, the covariance and pseudocovarance matrices of satisfy
So when , is and .
From this corollary, it is clear that is independent of , and when
. Therefore, the eigenvalues of the augmented covariance matrix becomes more degenerate when
. Next, we investigate the reassignment rule. Assume for the moment a completely general deterministic signal. The reassignment rule is merely an affine change of variables away from a particular quotient of complex Gaussians, which in the notation of equation (9) we define to be
Then is the second component of the vector divided by the first component, where
Of course, is a complex Gaussian whose second-order statistics are exactly those of , which we compute in the following lemma. In general, the frequency and amplitude of an oscillatory signal both vary with time. One model for such signals is the AHM [14, 9], where the frequency and amplitude of a signal are assumed to change slowly relative to its time-varying frequency. In the case of AHM, a signal can be well-approximated locally by a single harmonic component, and, in light of this, we assume from now on that is of the form for some fixed frequency and amplitude . We refer to the situation when as the null case. Under this assumption, equations (5), (6) and (7) then reduce to
We then also have , whereby (9) becomes
Here, the dependence on arises through the mean vector , which contains factors involving . When , equation (21) becomes
Notice that in the null case, and do not depend at all on . Observe now that the above gives us
The second term is bounded in mean by Proposition 3.2, so we see that the instantaneous frequency is close to . On the other hand, for far from , is close to zero because of the factor . In this case, is approximately . As and is held fixed, the density of the improper quotient converges pointwise to the density of , which by the Riesz-Scheffé theorem  implies that converges in distribution to . In the large limit, the latter has no impropriety, so this makes precise the sense in which the impropriety disappears as the magnitude of increases. This is summarized in the following proposition.
For all , we have in distribution as , where satisfies assumption 4.1.
We use the Riesz-Scheffé theorem . In particular, it suffices to verfiy that the density of the non-null reassignment rule converges pointwise to the density of the null one. Substituting in into the result of Theorem 3.1, we see that as , on account of the decay of , whereby tends to zero. Now, we wish to apply dominated convergence to show the hypergeometric factor in (16) tends to one. Since the hypergeometric function therein is increasing and the interval of integration is bounded, it suffices to bound the argument uniformly in ; which reduces to bounding , , and . The former is possible because , and the others are bounded by the norms of and . Integrability of the majorant is assured by the first part of 3.2. ∎
We now provide a sense in which many of our questions can be reduced to the mean-zero proper case:
Proposition 4.5 (Ratio density convergence).
In the synchrosqueezing case, for a fixed , as we have
We control the terms related to in Theorem 3.1. Because as , there exists a such that for all , we have
so we focus on controlling the second integrand. In particular,
The result follows by expanding . ∎
Corollary 4.6 (Quotient mean asymptotics).
In the synchrosqueezing case, for any we have as the asymptotics
Directly integrate using the above density asymptotics, and apply Proposition 4.3 33todo: 3Bad denom can be bounded above by by extracting ∎
Due to the fast decay of the Gaussian window, the covariance between and decays exponentially when increases. Intuitively, the covariance between and should also be small when is large, but the ratio structure might obfuscate the speed of decay. We solve this issue with the following theorem.
Theorem 4.7 (Ratio covariance).
For distinct , as we have
For a proof, see E.1. We remark that because the density of the reassignment rule is related to that of via a simple affine transformation, these results may be immediately extended to .
Clarify how it’s possible that the quotient can be proper, but not mean zero. In fact, the wording “proper quotient” is misleading because it’s not clear that the second-order statistics of Q itself are zero when its divisors have this property. I need to prove some extra facts about these quotients to clarify this; namely
(a) the mean of is zero when is real and and are uncorrelated mean zero.
(b) For a sinusoidal signal, I suspect this will hold too. Details to verify…
(c) Try some kind of gaussian interpolation like in Panchenko’s course for the complex RVs involving
are uncorrelated mean zero. (b) For a sinusoidal signal, I suspect this will hold too. Details to verify… (c) Try some kind of gaussian interpolation like in Panchenko’s course for the complex RVs involvingif that doesn’t work.
4.1. Preparation for the SST distribution
Following (3), define the complex random vector , where , and
Since is not defined when , we see that is defined on , where is a measure zero set. Naturally, the distribution of is one of the marginal distributions of . The seemingly complicated random variable turns out to have nice behavior – while the reassignment rule has a fat tail for every by Proposition 3.2, after being composed with a Gaussian function this fat tail is “tamed”. To simplify the heavy notation, when there is no danger of confusion, we suppress , and and emphasize the “bandwidth” and by denoting
We state below that has finite moments of all orders, relegating the proof to E.2.
Fix and . Then for all , the -th absolute moment satisfies
when is sufficiently small, where the implied constant depends on