Log In Sign Up

Non-Oscillatory Pattern Learning for Non-Stationary Signals

by   Jieren Xu, et al.
Duke University

This paper proposes a novel non-oscillatory pattern (NOP) learning scheme for several oscillatory data analysis problems including signal decomposition, super-resolution, and signal sub-sampling. To the best of our knowledge, the proposed NOP is the first algorithm for these problems with fully non-stationary oscillatory data with close and crossover frequencies, and general oscillatory patterns. Even in stationary cases with trigonometric patterns, numerical examples show that NOP admits competitive or better performance in terms of accuracy and robustness than several state-of-the-art algorithms.


page 1

page 2

page 3

page 4


Parametric Modeling of Non-Stationary Signals

Parametric modeling of non-stationary signals is addressed in this artic...

Atomic Norm Denoising for Complex Exponentials with Unknown Waveform Modulations

Non-stationary blind super-resolution is an extension of the traditional...

Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data

Large scale deep learning provides a tremendous opportunity to improve t...

Towards Deep Robot Learning with Optimizer applicable to Non-stationary Problems

This paper proposes a new optimizer for deep learning, named d-AmsGrad. ...

Parametric Modeling of EEG by Mono-Component Non-Stationary Signal

In this paper, we propose a novel approach for parametric modeling of el...

One or two frequencies? The Iterative Filtering answers

The Iterative Filtering method is a technique aimed at the decomposition...

Spectral analysis for non-stationary audio

A new approach for the analysis of non-stationary signals is proposed, w...

1 Introduction

This paper concerns oscillatory data defined on a time domain of the form:


Here and are the latent instantaneous amplitude and phase functions of the th component, which are assumed to be smooth over time. The derivative of a phase function is called an instantaneous frequency function and denoted as . is a periodic shape function with periodicity one satisfying and it has a unit L2-norm on .

Figure 1: An example of Model (1) with shape functions . (a) Shapes . (b) instantaneous frequencies (green) and (red) for all positive .

Oscillatory data in Model (1) arise in numerous applications HauBio2 ; Pinheiro2012175 ; 7042968 ; Crystal ; LuWirthYang:2016 ; Eng2 ; ME ; Canvas ; Canvas2 ; GeoReview ; SSCT ; 977903 ; 4685952 ; MUSIC ; SuperResolution:Candes ; gruber1997statistical ; burg1972relationship ; roy1989esprit and data analysis of this kind has been an active research field for decades. Usually only (and sometimes

) is available and the goal is to estimate

, (or ), and from . Hence, this is a general problem including and generalizing sub-problems like adaptive time-frequency analysis auger1995improving ; Daubechies1996 ; YANG2017 , empirical mode decomposition Huang1998 ; Wu2009 ; Wu2009EEMD , super-resolution MUSIC ; SuperResolution:Candes ; gruber1997statistical ; burg1972relationship ; roy1989esprit

, pattern recognition

zhu2013locally , etc. In spite of many successful algorithms for solving these sub-problems, to the best of our knowledge, there is no existing algorithm fulfilling the ultimate goal of estimating , (or ), and when is fully non-stationary with close and crossover frequencies, and general shape functions. Many existing algorithms require high sampling rate for better accuracy and robustness, which is not practical due to the limit of battery capacity of mobile devices that collect oscillatory data, e.g. portable health monitors.

Fig. 1 shows a synthetic example of Model (1) when , , , , , and . is in fact a superposition of infinitely many deformed planewaves due to the Fourier series expansion of shape functions. Hence, close and crossover frequencies are unavoidable. In terms of frequencies, due to Heisenberg uncertainty principle, time-frequency analysis methods auger1995improving ; Daubechies1996 ; YANG2017 are not able to estimate these instantaneous frequencies; due to the fully non-stationary nature of , existing super-resolution methods MUSIC ; SuperResolution:Candes ; gruber1997statistical ; burg1972relationship ; roy1989esprit are not able to estimate frequencies . Moreover, existing GP based methods wilson2013gaussian ; pan2017prediction ; parra2017spectral ; ulrich2015gp ; quia2010sparse cannot produce reliable source separation in Model (1). Non-compact support of in the Fourier domain (Fig. 2(b)) creates particular challenges. Some other regression methods have also been designed to estimate shape functions xu2018recursive ; MMD ; FMMD , but they assume phase functions are known.

In order to solve the challenging inverse problem implicit to Model (1), it is natural to choose priors within a Bayesian model for the latent components. While there are many possible stochastic process choices, Guassian Processes (GPs) are appealing in enforcing smoothness providing easy inclusion of prior information and leading to computational tractability. This paper proposes a non-oscillatory pattern (NOP) learning. scheme, the first framework that can estimate , (or ), and simultaneously from . NOP repeatedly applies a two-stage iteration until convergence. In the first stage, fixing rough estimates of shapes, a novel non-stationary GP regression is proposed to estimate amplitude and phase functions. A novel set of auxiliary points, referred to as the pattern inducing points, are introduced for this purpose. As oppose to traditional stationary kernels, non-stationary kernel functions in our GP model approximately transform a non-stationary data analysis problem into a stationary one, greatly reducing the regression difficulty. Furthermore, we propose to embed the information of rough shape estimates into the GP regression model, reducing a deep GP regression problem zhu2013locally ; damianou2013deep ; dai2015variational with a composition of two latent variables (e.g. the composition of a shape function and a phase function) into a simpler problem with only one latent variable, the phase functions. This significantly reduces the computational cost and difficulty in the regression.

In the second stage, fixing rough estimates of amplitudes and phases, a non-stationary GP regression is proposed or other iterative regression methods in FMMD ; xu2018recursive ; 1DSSWPT ; MMD

is applied to estimate shapes. The main difference of the proposed non-stationary GP regression in the second stage to existing methods is that, the variance of the point estimate of shape functions is simple to derive.

The first and second stages of the algorithm are introduced in Sections 2 and 3, respectively. Section 4 summarizes NOP. As we shall see in the numerical examples in Section 5, NOP works for a wide range of signals in Model (1) and the performance is not sensitive to initialization.

2 Estimation of the instantaneous information

In this section, we assume the shape functions are known and aim at estimating the phase and amplitude functions and , respectively.

2.1 Main ideas

We start with a simple case when the signal . Assume a non-stationary GP , and a stationary GP , where is the automatic relevance determination (ARD) rasmussen2006gaussian squared exponential (SE) kernel:


with kernel parameters and . Now we consider the phase function as the latent variable of the GP , and claim the resulting GP is periodic and stationary in the domain . This is because , by plugging in the new input , the corresponding output is , i.e., any point lies on the curve . seems to be a deep/hierarchical GP model dai2015variational , but in fact this ‘unwarping’ process directly removes the first GP layer while keeping the second layer stationary. This is a crucial idea for success of our NOP. And we write the one layer GP with input as for .

Figure 2: The pattern driven model. (a) the data points are driven and finally match to the underlying pattern . (b) .

Next, we introduce a point estimate of the phase function as a latent variable. Suppose 111

We will use bold font for vectors.

is the observations (contaminated by white-noise with standard deviation

) of at time locations . Denote and . As can be seen, a direct inference of the latent , even when is given, is not trivial. This motivates us to introduce a new set of auxiliary points for the GP to retrieve the latent input variable , and to reveal the underlying patterns for Model (1) when is not known. We refer to this novel set of auxiliary points as the pattern inducing points, denoted as , where is the input variable and is the respective output evaluated at the inducing locations . Usually we uniformly discretize and fix at phase locations in , generally .

Similar to sparse inducing points snelson2006sparse ; titsias2009variational , the pattern inducing points are treated as variational parameters, and can be updated. They possess the benefits of both sparse inducing points and training points rasmussen2006gaussian , but aim to reveal the true underlying pattern of Model (1) and are endowed with specific meanings. The intuition behind the pattern inducing points is that they serve as certain landmarks, shown as the green dots (input as the green triangles) in Fig. 2, of the true underlying pattern (shaded green line). The latent input (red right triangle in Fig. 2) can drive the dataset (red dots in Fig. 2) across the phase domain (in Fig. 2(a) from (top) the initial wrong to (middle) to (bottom) the correct ), to match the underlying pattern (in green) by these landmarks. The that matches the dataset with the landmark points is the correct since for . So this process is also referred to as a pattern driven method for Model (1). Note that setting of with the SE kernel, is much more stable than the setting of with the periodic kernel rasmussen2006gaussian , which is highly non-convex and can be trapped at much more local minima validated by our numerical result.

Let be the variational distribution to approximate the posterior distribution of the inducing variable . For the purpose of computational efficiency, we adopt , where is the mean of and the variance matrix of is diagonal. In this section since has been observed, and

. When the auxiliary pattern inducing points are ready, we design the following data-pattern matching process in a standard GP setting by encoding the latent variable

in the GP kernel. Conditioning on and , the likelihood of becomes


By further marginalizing out and out, we have



is an identity matrix of size

, is an covariance matrix, 222Denote as the th entry of a vector, and as the th entry of a matrix. for , and similarly for , . Hence, a point estimate of the latent variable conditioning on can be computed by Bayesian approach using Eqn. (4).

In the case of multi-components as in Model (1), we denote , , and for the th component, and . Assuming the independence among variables of the same type, e.g., among , among , etc., Eqn. (4) can be generalized to


where consists of latent variables as the th column, , , , and are defined for covariance matrices of the th component similarly to the case of one component.

When we have time-varying smooth amplitude functions as in Model (1), let , then are a new set of latent variables of . In this case, Eqn. (5) takes the form:


where consists of as its -th column.

To accelerate the computation, instead of applying the Bayesian approach with Eqn. (6), we directly fit the mean value of the Gaussian process with the observations via a least square (LS) problem:


where has been encoded in the kernel matrices in the minimization objective function.

2.2 Adaptive local estimation

The optimization problem in (7) is non-convex and there is no strong prior of and to provide a good initialization. Note that in most applications, and usually smoothly vary in time. This motivates us to generate signal patches and locally parametrize the phase and amplitude functions with low-degree polynomials in Eqn. (7). Within each patch, the non-stationary signal becomes much more stationary. As we shall see later, our numerical experiments show that degree is sufficiently good.

For each observation patch (also denoted as ), let and be the matrices consisting of the coefficients of the -degree polynomials representing the amplitude and phase functions, respectively. Then we can specify the amplitude and phase functions with and by the following LS problem:


where is the entry-wise dot product, and has been absorbed in the covariance matrices.

Once the polynomial coefficients and of the amplitudes and phases in Eqn. (8) have been specified, they provide a local point estimate of the amplitude and phase functions, which can be used to obtain a global point estimate via a robust curve fitting algorithm garcia2010robust ; garcia2011fast . A standard moving average can be applied to estimate the variance of the point estimate.

3 Estimation of shape functions

In this section, we assume the phase and amplitude functions and are known and estimate the shape functions . We do not aim at closed formulas for . As introduced in Section 2, it is sufficient to estimate the pattern inducing variables and to represent .

When amplitude and phase functions are given, shape function estimation methods have been studied previously in FMMD ; xu2018recursive ; 1DSSWPT ; MMD . These methods achieves high accuracy when amplitude and phase function estimates are close to the ground truth. There is no quantitative criteria to measure how well the shape function estimate performs when the amplitude and phase function estimate is not very good. This motivates us to apply variational inference, following titsias2009variational , that explicitly expresses the distribution of the pattern inducing variables in a joint form. Please refer to Supplemental Material (SM) Section A and Section B for details of these three well-established approaches.

4 Overview of NOP

NOP repeatedly applies a two-stage iteration until convergence. We have introduced the first stage in Section 2: given rough estimates of shapes, a non-stationary GP regression is applied to estimate amplitude and phase functions. The second stage has been introduced in Section 3: given rough estimates of amplitudes and phases, several regression methods can be adopted to estimate shapes. Hence, a complete algorithm description can be summarized in Algorithm 1 below. The respective prediction formulation of Algorithm 1 is derived in SM Section C.

Input: measurements of , the number of components , the maximum iteration number , and the accuracy parameter .

Output: estimates of the pattern inducing variables representing , and the latent variables and representing and , respectively.

Initialization: , , the current iteration number . Initialize the estimates of and , and let ; or initialize the estimate of and let .

while , , , and  do

       if  then
             Given and in the previous iteration, compute by the method in Section 3.
            Given in the previous iteration, compute and by the method in Section 2.
      if  then
             Given in the last step just above, compute and by the method in Section 2.
             Given and in the last step just above, compute by the method in Section 3.
      Update , and are set to be the -norm of the difference of the amplitude and phase estimates in the previous and current iteration.
Algorithm 1 NOP

In many applications, e.g. ECG and photoplethysmogram data analysis, heuristic properties of the physical system are available and we know the rough range of instantaneous frequencies

. Hence, we can apply a band-pass filter to , and then estimate and of in a certain frequency band using traditional time-frequency analysis methods Auger1995; Daubechies1996 ; YANG2017 to initialize and following the method in 1DSSWPT .

Another simple initialization is to let . Since we adopt local patch analysis in Section 2, components after Fourier series expansion , , become approximately orthogonal to each other in a short time domain. Hence, the LS in (7) is able to recover the amplitude and phase functions corresponding to since they usually have the largest magnitude.

The LS problems are non-convex and hence we cannot guarantee convergence to the global minimizer. However, the iterative scheme seems to provide very good results and the algorithm generally converges after only a few iterations. The good performance of NOP might come from the fact that once the amplitude and phase function estimates are roughly good (might not be the global minimizer of LS problems), the shape estimation step can quickly provide very good estimate of shape functions, which can be guaranteed if we adopt methods in FMMD ; xu2018recursive ; MMD . in Algorithm 1 specifies whether we update shape functions first or not; if the initialization of shape functions are better than those of amplitude and phase functions, we choose ; otherwise, we choose . The global convergence analysis would be an interesting future work.

5 Experiments

In this section, we provide numerical examples to demonstrate the performance of NOP333Code to be appear in, especially in the case of super-resolution and adaptive time-frequency analysis. LS problems in all examples are solved by Adam goodfellow2016deep aiming at better local minimizers. We choose degree-(or degree-

when specified) polynomials to approximate local amplitude and phase functions in these LS problems. The hyperparameters of NOP are set as follows: noise level

, , and . In the local patch analysis, we generate signal patches such that each patch contains approximately to periods. In the tests for super-resolution, we repeat the same test with noise realizations for the purpose of using the expectation and variance of estimation error to measure the performance of different algorithms. and denote the point-wise estimation error.

5.1 Super-resolution spectral estimation

There has been substantial research for the super-resolution problem that aims at estimating time-invariant amplitudes and frequencies in a signal with , , and are very close. Among many possible choices, the baseline models might be MUSIC schmidt1986multiple , ME burg1972relationship ; georgiou2001spectral ; georgiou2002spectral , and ESPRIT roy1989esprit . Hence, we will compare NOP with these methods444Code from

to show the advantages of NOP. Although the Fourier transform usually fails 

schmidt1986multiple to identify and , we use its results as the initialization for NOP.

(a) Clean signal (b) (c) (d)
Figure 3: Frequency estimate (absolute) error of , where and with different and white noise .
Figure 4: Left: results of with different number of samples , , , and from top to bottom, and by different methods in an order of NOP, ME, ESPRIT, and MUSIC from left to right. The ground truth frequencies are . tests with different noise realization were performed and the estimated frequencies are visualized in a D domain centered at the ground truth. Right: the expectation and variance of estimation errors for different methods and numbers of samples.

Accuracy and robustness with different spectral gaps

In this experiment, we use , where the two instantiations of / are / (red lines in Fig. 3) and / as in Fig. 1 (pink lines in Fig. 3). Here and , varies from to , and different noise variance are . The sampling rate is Hz and the number of samples is in this example. Fig. 3 shows the frequency estimation accuracy of NOP, MUSIC, ESPRIT, and ME. As we can see, NOP achieves machine accuracy in the noiseless case and is much more accurate than other methods in all noisy cases. It also reaches almost the same accuracy for the trigonometric (red) and shaped(pink) instantiations in and . Baseline methods are not applicable to instantiation .

Accuracy and robustness with different sampling rates

In this experiment, we set with , , and . The sampling rate of this signal is still Hz and the numbers of samples are , , , and to generate four sets of test data. There are two different kinds of noise to generate noisy test data: 1) white Gaussian noise is directly added to ; 2) a stochastic process in

with i.i.d. uniform distribution in

is added to phase functions . Fig. 4 summarizes the results of frequency estimates in this experiment. ESPRIT and MUSIC lose accuracy in all tests. NOP and ME achieve high accuracy when the number of samples is large and NOP is slightly better than ME in terms of accuracy and estimation bias.

5.2 Estimation of time-variant frequencies

In this section, we show the capacity of NOP for estimating close and crossover time-varying instantaneous frequencies. An adaptive time-frequency analysis algorithm, ConceFT Daubechies20150193 ), is used as a comparison. And local approximation degree is set to in this section.

(a) and in time-frequency domain    (b) Error for    (c) Error for

Figure 5: Short signal with linear frequencies. (a) visualizes all the instantaneous frequencies of our synthetic components as the spectral gap parameter takes the values for . (b) is the estimation error for frequency (top) and phase (bottom) estimates when ; (c) is for .

Close frequencies and phase estimation error

We use , where the two instantiations of / are / (Fig. 5(b)) and / as in Fig. 1 (Fig. 5(c)). varies from to . The white noise has standard deviation . We apply short-time Fourier transform griffin1984signal to identify rough estimates of instantaneous frequencies and use them as the initialization in this test. When instantaneous frequencies are very close, the initialization is very poor; however, NOP still can identify instantaneous frequencies and phases with a reasonably good accuracy. Result is summarized in Fig. 5.

Fig. 5(a) is the ground truth time-frequency representation of ten tested signals with different value of on . The difference between (green line) and (red line) are pretty difficult to be detected by existing time-frequency methods. The log error of the point-wise averaged frequency estimate is shown in the first row of of Fig. 5 (b) and (c) on different noise levels . The log error of point-wise averaged phase estimate (bottom row) is consistently small as changes. Under large noise case with , NOP controls the phase error approximate or below the level of . Existing time-frequency analysis methods usually estimate instantaneous frequencies first and then integrate them to obtain instantaneous phases, which suffers from accumulated error. However, NOP has no accumulated error.

Close and crossover frequencies

In this experiment, we generate a signal consisting of two components with close instantaneous frequencies and a signal with two crossover instantaneous frequencies. Fig. 6 visualizes the ground truth instantaneous frequencies, the time-frequency distribution by ConceFT, the initialization and the estimation results of NOP. ConceFT cannot visualize the instantaneous frequencies even if in the noiseless case. We average out the energy distribution of ConceFT to obtain the initialization of NOP. Although the initialization is very poor, NOP is still able to estimate the instantaneous frequencies with a reasonably good accuracy no matter in clean or noisy cases. However, when the number of components is known, we can average out the energy band to obtain one instantaneous frequency function and initialize all instantaneous frequencies in NOP using this function. We will show how to handle this special initialization in SM Section D.

Figure 6: Instantaneous frequency estimates for signals with close and crossover frequencies. (a) ground truth instantaneous frequencies and initialization of NOP. (b) estimated instantaneous frequencies for clean signals. (c) estimated instantaneous frequencies for noisy signals. (d) time-frequency distribution by ConceFT.
Figure 7: NOP is applied to estimate the amplitude, phase, and shapes of a synthetic signal consisting of two components. (a) the time-frequency distribution of by ConceFT in two different frequency ranges. ConceFT cannot reveal the ground truth instantaneous frequencies (in red and green). But we can initialize NOP by averaging out the distribution (see the dash pink line). (b) and (c) the ground truth shape functions and their estimates. (d) the noisy signal and the reconstructed components by NOP.

5.3 Estimation of amplitudes, phases, and shapes simultaneously

Finally, we apply NOP to estimate amplitudes, phases, and shapes simultaneously from a single record. First, we generate a synthetic example , where , , and the shapes are visualized in Fig. 7. The sampling rate for this signal is Hz and we sample it at locations. The shape estimates are initialized as and for the first and second components, respectively (see Fig. 7 (b) and (c)). The frequency estimates are initialized as one constant centered in the peak spectrogram by ConceFT (see Fig. 7 (a)). As we can see in Fig. 7 (b) and (c), NOP is able to estimate shape functions with a reasonably good accuracy and the reconstructed components match the ground truth components very well.

In the second example, we apply NOP to a real signal from photoplethysmogram (PPG) (see Fig. 8 (b)). The shape estimates are still initialized as and for the two components, and samples are involved. The PPG signal contains two components corresponding to the health condition of the heart and lung in a human body.

Figure 8: (a) PPG signal. (b) reconstructed components of the PPG signal. These two components were reconstructed from only a small portion of the samples of the original PPG raw data as visuallized in the bottom figure.

6 Conclusion

This paper proposes a novel non-oscillatory pattern (NOP) learning scheme for several oscillatory data analysis problems including signal decomposition, super-resolution, and signal sub-sampling. To the best of our knowledge, the proposed NOP is the first algorithm for these problems with fully non-stationary oscillatory data with close and crossover frequencies, and general oscillatory patterns. Numerical examples have shown the advantage of NOP over several state-of-the-art algorithms and NOP is able to handle complicated examples for which existing algorithms fail. NOP could be a very useful tool for pattern analysis for oscillatory data. Although we cannot prove the global convergence of NOP, NOP seems to provide very good results in all of our tests. It is interesting to study the global convergence of NOP in the future.


  • (1) E. Alonso, E. Aramendi, D. González-Otero, U. Ayala, M. Daya, and J. K. Russell. Empirical mode decomposition for chest compression and ventilation detection in cardiac arrest. In Computing in Cardiology 2014 2014.
  • (2) F. Auger and P. Flandrin. Improving the readability of time-frequency and time-scale representations by the reassignment method. IEEE Transactions on signal processing, 1995.
  • (3) X. Bai, M. Xing, F. Zhou, G. Lu, and Z. Bao. Imaging of micromotion targets with rotating parts based on empirical-mode decomposition. IEEE Transactions on Geoscience and Remote Sensing 2008.
  • (4) J. P. Burg. The relationship between maximum entropy spectra and maximum likelihood spectra. Geophysics, 1972.
  • (5) B. Cornelis, H. Yang, A. Goodfriend, N. Ocon, J. Lu, and I. Daubechies. Removal of canvas patterns in digital acquisitions of paintings. IEEE Transactions on Image Processing 2017.
  • (6) Z. Dai, A. Damianou, J. González, and N. Lawrence. Variational auto-encoded deep Gaussian Processes. arXiv preprint arXiv:1511.06455, 2015.
  • (7) A. Damianou and N. Lawrence. Deep Gaussian Processes. In Artificial Intelligence and Statistics, 2013.
  • (8) I. Daubechies and S. Maes. A nonlinear squeezing of the continuous wavelet transform based on auditory nerve models. In Wavelets in Medicine and Biology. CRC Press, 1996.
  • (9) I. Daubechies, Y. G. Wang, and H.-t. Wu. Conceft: concentration of frequency and time via a multitapered synchrosqueezed transform. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 2016.
  • (10) D. Garcia. Robust smoothing of gridded data in one and higher dimensions with missing values. Computational Statistics & Data Analysis, 2010.
  • (11) D. Garcia. A fast all-in-one method for automated post-processing of PIV data. Experiments in fluids, 2011.
  • (12) T. T. Georgiou. Spectral estimation via selective harmonic amplification. IEEE Transactions on Automatic Control, 2001.
  • (13) T. T. Georgiou. Spectral analysis based on the state covariance: the maximum entropy spectrum and linear fractional parametrization. IEEE transactions on Automatic Control, 2002.
  • (14) I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.
  • (15) D. Griffin and J. Lim. Signal estimation from modified short-time fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984.
  • (16) M. H. Gruber. Statistical digital signal processing and modeling, 1997.
  • (17) N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.-C. Yen, C. C. Tung, and H. H. Liu. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. R. Soc. Lond. Proc. Ser. A Math. Phys. Eng. Sci., 1998.
  • (18) W. Huang, Z. Shen, N. E. Huang, and Y. C. Fung. Engineering analysis of biological variables: An example of blood pressure over 1 day. Proc. Natl. Acad. Sci., 1998.
  • (19) Y. Huanyin, G. Huadong, H. Chunming, L. Xinwu, and W. Changlin. A sar interferogram filter based on the empirical mode decomposition method. In IGARSS 2001. Scanning the Present and Resolving the Future. Proceedings. IEEE 2001 International Geoscience and Remote Sensing Symposium (Cat. No.01CH37217), volume 5, 2001.
  • (20) C. E. J. and F.-G. Carlos. Towards a mathematical theory of super-resolution. Communications on Pure and Applied Mathematics.
  • (21) W. Liao and A. Fannjiang. Music for single-snapshot spectral estimation: Stability and super-resolution. Applied and Computational Harmonic Analysis, 2016.
  • (22) J. Lu, B. Wirth, and H. Yang. Combining 2D synchrosqueezed wave packet transform with optimization for crystal image analysis. Journal of the Mechanics and Physics of Solids, 2016.
  • (23) Y. Pan, X. Yan, E. A. Theodorou, and B. Boots. Prediction under uncertainty in sparse spectrum Gaussian Processes with applications to filtering and control. In ICML, 2017.
  • (24) G. Parra and F. Tobar. Spectral mixture kernels for multi-output Gaussian Processes, 2017.
  • (25) E. Pinheiro, O. Postolache, and P. Girão.

    Empirical mode decomposition and principal component analysis implementation in processing non-invasive cardiovascular signals.

    Measurement, 2012. Special issue on Electrical Instruments.
  • (26) J. QuiÃonero-Candela, C. E. Rasmussen, A. R. Figueiras-Vidal, et al. Sparse spectrum Gaussian Process regression.

    Journal of Machine Learning Research

    , 2010.
  • (27) C. E. Rasmussen and C. K. Williams. Gaussian Processes for machine learning, volume 1. MIT press Cambridge, 2006.
  • (28) R. Roy and T. Kailath. Esprit-estimation of signal parameters via rotational invariance techniques. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989.
  • (29) R. Schmidt. Multiple emitter location and signal parameter estimation. IEEE transactions on antennas and propagation, 1986.
  • (30) E. Snelson and Z. Ghahramani. Sparse Gaussian Processes using pseudo-inputs. In Advances in Neural Information Processing Systems, 2006.
  • (31) G. Tang and H. Yang. A fast algorithm for multiresolution mode decomposition. arXiv:1709.06880, 2017.
  • (32) J. B. Tary, R. H. Herrera, J. Han, and M. van der Baan. Spectral estimation-What is new? What is next? Rev. Geophys. 2014.
  • (33) M. K. Titsias. Variational learning of inducing variables in sparse Gaussian Processes. In International Conference on Artificial Intelligence and Statistics, 2009.
  • (34) K. R. Ulrich, D. E. Carlson, K. Dzirasa, and L. Carin. GP kernels for cross-spectrum analysis. In NIPS, 2015.
  • (35) A. Wilson and R. Adams. Gaussian Process kernels for pattern discovery and extrapolation. In ICML, 2013.
  • (36) H. Wu, Y.-H. Chan, Y.-T. Lin, and Y.-H. Yeh. Using synchrosqueezing transform to discover breathing dynamics from ECG signals. Applied and Computational Harmonic Analysis, 2014.
  • (37) Z. Wu and N. E. Huang. Ensemble empirical mode decomposition: a noise-assisted data analysis method. Advances in Adaptive Data Analysis, 2009.
  • (38) Z. Wu, N. E. Huang, and X. Chen. The multi-dimensional ensemble empirical mode decomposition method. Adv. Adapt. Data Anal., 2009.
  • (39) J. Xu, H. Yang, and I. Daubechies. Recursive diffeomorphism-based regression for shape functions. SIAM Journal on Mathematical Analysis, 2018.
  • (40) H. Yang. Synchrosqueezed wave packet transforms and diffeomorphism based spectral analysis for 1D general mode decompositions. Applied and Computational Harmonic Analysis, 2015.
  • (41) H. Yang. Multiresolution mode decomposition for adaptive time series analysis. arXiv:1709.06880, 2017.
  • (42) H. Yang. Statistical analysis of synchrosqueezed transforms. Applied and Computational Harmonic Analysis, 2017.
  • (43) H. Yang, J. Lu, W. Brown, I. Daubechies, and L. Ying. Quantitative canvas weave analysis using 2-D synchrosqueezed transforms: Application of time-frequency analysis to art investigation. Signal Processing Magazine, IEEE 2015.
  • (44) H. Yang, J. Lu, and L. Ying. Crystal image analysis using 2D synchrosqueezed transforms. Multiscale Modeling & Simulation, 2015.
  • (45) H. Yang and L. Ying. Synchrosqueezed curvelet transform for two-dimensional mode decomposition. SIAM Journal on Mathematical Analysis, 2014.
  • (46) C. Zhang, Z. Li, C. Hu, S. Chen, J. Wang, and X. Zhang. An optimized ensemble local mean decomposition method for fault detection of mechanical components. Measurement Science and Technology, 2017.
  • (47) B. Zhu and D. B. Dunson. Locally adaptive Bayes nonparametric regression via nested Gaussian Processes. Journal of the American Statistical Association, 2013.

Appendix A Joint pattern describing points

(a) View 1 (b) View 2
Figure 9: Pattern inducing points of a signal containing two shape functions and with supported and arranged in the one block. (ab): are two views of these rearranged auxiliary points.

For method , when the influence of the amplitude function is negligible, another natural approach is to apply the variational inference, following titsias2009variational . We can explicitly express the distribution of the pattern inducing points in a joint form , where , and with . The th row of is denoted as , and the respective th entry of the joint is expected to approximate . One exemplified is shown in Fig. 9. The goal is to maximize a lower bound of ( is fixed and thus dropped for now) given as


To reduce the unnecessary computational complexity, the periodic kernel takes the form of


By applying an inverse Jensen equality on Eqn. (10), we obtain the variational maximizer of the lower bound as


where . Then we can further get and . The mean field approximation and a -dimensional ANOVA is applied to retrieve , for , in closed forms from the joint estimation of . The separated estimates of can be approximated for the respective shape functions . Computational details are listed in the following section.

Appendix B Update pattern inducing points

We start from under the situation that the influence of the amplitude function is very weak and a point estimate of the latent input is given.

One natural approach is to apply the variational inference following titsias2009variational , which explicitly expresses the distribution of inducing points by maximizing a lower bound of ( is dropped in the rest part of this section) :


which is


By applying the inverse Jensen Inequality on the first term, the variational maximizer of the lower bound is derived as


In the above equations, , and . The mean field approximation is applied to . Then, we optimize and the diagonal version of , denoted as by


The above optimization problem is essentially a dimensional ANOVA and can be solved in closed form.

The separated estimates of serves as an approximation for the respective shape functions .

Method can be implemented by a nonlinear band truncation on the respective time-frequency domain obtained by the ConceFT, STFT, etc. Fig. 10 shows an example of such nonlinear truncation. As can be seen, after this adaptive frequency filtering, only respective low frequency part of each mode is kept. is transformed from the original to , or . denotes for unknown initial value of the phase function.

(a) ConceFT (b) Truncated ConceFT
Figure 10: Applying an adaptive filter to change into . (ab): are the original and truncated time-frequency domain.

The general idea of the approaches in Method is to apply a nonlinear diffeomorphism on the signal according to the updated latent phase function . Then apply adaptive regression to further guarantee certain properties of the underlying pattern. We refer the readers to xu2018recursive ; MMD ; FMMD .

Appendix C Prediction

In this part, we introduce how to make predictions on the signals or to generate new samples by NOP with updated output , , and or . Since the amplitude function and the frequency function are assumed to be smooth, we can put a GP prior on both of them and treat the updated , as respective learning points. The updated patterning inducing points are directly plugged in for inference.

If we want to predict on new time points , the respective predicted value of can be obtained as following. The first sample of with respect to is drawn from . of from is drawn by the following equation




where, , and .



where , and , thus


Since different components are independent, we have


Therefore, can be drawn from distribution (27).

Further, if is approximated by the mean field method, can be simply integrated out from distribution (27).

Appendix D Non-distinguishable Problem

In this section, we introduce an ad hoc approach to tackle the non-distinguishable problem arises when the initialized instantaneous frequency of different components are set to a same function. For the estimated frequencies and , where , we want to distinguish components with instantaneous frequency or for each time point .

The naive method is to sort and firstly. Then we cluster components by their values. This actually works pretty well when no crossover happened for and , which is shown in Fig. 11 (a). However, when the crossover happens, as with the second row in Fig. 7, this simple clustering method will fail, as shown in Fig. 11 (b).

One simple way to tackle this situation is stated as following, based on the smoothness assumption placed on the frequency functions . First we identify all the possible crossover points of , which is the robust local minimum of . Then the initial clustering obtained from the simple sorting, as shown in Fig. 11 (a)(b), is switched whenever passes a potential crossover point, in the order of increasing time. This simple reorder trick will output the new clustering as shown in Fig. 11 (c), which is satisfactory for a further refinement to get a point estimate of .

(a) (b) (c)
Figure 11: Solution for the non-distinguishable problem.(a)(b) are obtained by the simple ordering clustering. (c) is obtained by the corrected ordering clustering based on the cross-over points identification.

Appendix E Signals Plots

This section gives the plots of signals used in the main text.

Figure 12: version of .
Figure 13: Shape version of .
Figure 14: 5 of the 100 random signals in form of .
Figure 15: version .
Figure 16: Shape version of .
Figure 17: Top three for and bottom three for .