1 Introduction
This paper concerns oscillatory data defined on a time domain of the form:
(1) 
Here and are the latent instantaneous amplitude and phase functions of the th component, which are assumed to be smooth over time. The derivative of a phase function is called an instantaneous frequency function and denoted as . is a periodic shape function with periodicity one satisfying and it has a unit L2norm on .
Oscillatory data in Model (1) arise in numerous applications HauBio2 ; Pinheiro2012175 ; 7042968 ; Crystal ; LuWirthYang:2016 ; Eng2 ; ME ; Canvas ; Canvas2 ; GeoReview ; SSCT ; 977903 ; 4685952 ; MUSIC ; SuperResolution:Candes ; gruber1997statistical ; burg1972relationship ; roy1989esprit and data analysis of this kind has been an active research field for decades. Usually only (and sometimes
) is available and the goal is to estimate
, (or ), and from . Hence, this is a general problem including and generalizing subproblems like adaptive timefrequency analysis auger1995improving ; Daubechies1996 ; YANG2017 , empirical mode decomposition Huang1998 ; Wu2009 ; Wu2009EEMD , superresolution MUSIC ; SuperResolution:Candes ; gruber1997statistical ; burg1972relationship ; roy1989esprit zhu2013locally , etc. In spite of many successful algorithms for solving these subproblems, to the best of our knowledge, there is no existing algorithm fulfilling the ultimate goal of estimating , (or ), and when is fully nonstationary with close and crossover frequencies, and general shape functions. Many existing algorithms require high sampling rate for better accuracy and robustness, which is not practical due to the limit of battery capacity of mobile devices that collect oscillatory data, e.g. portable health monitors.Fig. 1 shows a synthetic example of Model (1) when , , , , , and . is in fact a superposition of infinitely many deformed planewaves due to the Fourier series expansion of shape functions. Hence, close and crossover frequencies are unavoidable. In terms of frequencies, due to Heisenberg uncertainty principle, timefrequency analysis methods auger1995improving ; Daubechies1996 ; YANG2017 are not able to estimate these instantaneous frequencies; due to the fully nonstationary nature of , existing superresolution methods MUSIC ; SuperResolution:Candes ; gruber1997statistical ; burg1972relationship ; roy1989esprit are not able to estimate frequencies . Moreover, existing GP based methods wilson2013gaussian ; pan2017prediction ; parra2017spectral ; ulrich2015gp ; quia2010sparse cannot produce reliable source separation in Model (1). Noncompact support of in the Fourier domain (Fig. 2(b)) creates particular challenges. Some other regression methods have also been designed to estimate shape functions xu2018recursive ; MMD ; FMMD , but they assume phase functions are known.
In order to solve the challenging inverse problem implicit to Model (1), it is natural to choose priors within a Bayesian model for the latent components. While there are many possible stochastic process choices, Guassian Processes (GPs) are appealing in enforcing smoothness providing easy inclusion of prior information and leading to computational tractability. This paper proposes a nonoscillatory pattern (NOP) learning. scheme, the first framework that can estimate , (or ), and simultaneously from . NOP repeatedly applies a twostage iteration until convergence. In the first stage, fixing rough estimates of shapes, a novel nonstationary GP regression is proposed to estimate amplitude and phase functions. A novel set of auxiliary points, referred to as the pattern inducing points, are introduced for this purpose. As oppose to traditional stationary kernels, nonstationary kernel functions in our GP model approximately transform a nonstationary data analysis problem into a stationary one, greatly reducing the regression difficulty. Furthermore, we propose to embed the information of rough shape estimates into the GP regression model, reducing a deep GP regression problem zhu2013locally ; damianou2013deep ; dai2015variational with a composition of two latent variables (e.g. the composition of a shape function and a phase function) into a simpler problem with only one latent variable, the phase functions. This significantly reduces the computational cost and difficulty in the regression.
In the second stage, fixing rough estimates of amplitudes and phases, a nonstationary GP regression is proposed or other iterative regression methods in FMMD ; xu2018recursive ; 1DSSWPT ; MMD
is applied to estimate shapes. The main difference of the proposed nonstationary GP regression in the second stage to existing methods is that, the variance of the point estimate of shape functions is simple to derive.
2 Estimation of the instantaneous information
In this section, we assume the shape functions are known and aim at estimating the phase and amplitude functions and , respectively.
2.1 Main ideas
We start with a simple case when the signal . Assume a nonstationary GP , and a stationary GP , where is the automatic relevance determination (ARD) rasmussen2006gaussian squared exponential (SE) kernel:
(2) 
with kernel parameters and . Now we consider the phase function as the latent variable of the GP , and claim the resulting GP is periodic and stationary in the domain . This is because , by plugging in the new input , the corresponding output is , i.e., any point lies on the curve . seems to be a deep/hierarchical GP model dai2015variational , but in fact this ‘unwarping’ process directly removes the first GP layer while keeping the second layer stationary. This is a crucial idea for success of our NOP. And we write the one layer GP with input as for .
Next, we introduce a point estimate of the phase function as a latent variable. Suppose ^{1}^{1}1
We will use bold font for vectors.
is the observations (contaminated by whitenoise with standard deviation
) of at time locations . Denote and . As can be seen, a direct inference of the latent , even when is given, is not trivial. This motivates us to introduce a new set of auxiliary points for the GP to retrieve the latent input variable , and to reveal the underlying patterns for Model (1) when is not known. We refer to this novel set of auxiliary points as the pattern inducing points, denoted as , where is the input variable and is the respective output evaluated at the inducing locations . Usually we uniformly discretize and fix at phase locations in , generally .Similar to sparse inducing points snelson2006sparse ; titsias2009variational , the pattern inducing points are treated as variational parameters, and can be updated. They possess the benefits of both sparse inducing points and training points rasmussen2006gaussian , but aim to reveal the true underlying pattern of Model (1) and are endowed with specific meanings. The intuition behind the pattern inducing points is that they serve as certain landmarks, shown as the green dots (input as the green triangles) in Fig. 2, of the true underlying pattern (shaded green line). The latent input (red right triangle in Fig. 2) can drive the dataset (red dots in Fig. 2) across the phase domain (in Fig. 2(a) from (top) the initial wrong to (middle) to (bottom) the correct ), to match the underlying pattern (in green) by these landmarks. The that matches the dataset with the landmark points is the correct since for . So this process is also referred to as a pattern driven method for Model (1). Note that setting of with the SE kernel, is much more stable than the setting of with the periodic kernel rasmussen2006gaussian , which is highly nonconvex and can be trapped at much more local minima validated by our numerical result.
Let be the variational distribution to approximate the posterior distribution of the inducing variable . For the purpose of computational efficiency, we adopt , where is the mean of and the variance matrix of is diagonal. In this section since has been observed, and
. When the auxiliary pattern inducing points are ready, we design the following datapattern matching process in a standard GP setting by encoding the latent variable
in the GP kernel. Conditioning on and , the likelihood of becomes(3) 
By further marginalizing out and out, we have
(4) 
Here,
is an identity matrix of size
, is an covariance matrix, ^{2}^{2}2Denote as the th entry of a vector, and as the th entry of a matrix. for , and similarly for , . Hence, a point estimate of the latent variable conditioning on can be computed by Bayesian approach using Eqn. (4).In the case of multicomponents as in Model (1), we denote , , and for the th component, and . Assuming the independence among variables of the same type, e.g., among , among , etc., Eqn. (4) can be generalized to
(5) 
where consists of latent variables as the th column, , , , and are defined for covariance matrices of the th component similarly to the case of one component.
When we have timevarying smooth amplitude functions as in Model (1), let , then are a new set of latent variables of . In this case, Eqn. (5) takes the form:
(6) 
where consists of as its th column.
To accelerate the computation, instead of applying the Bayesian approach with Eqn. (6), we directly fit the mean value of the Gaussian process with the observations via a least square (LS) problem:
(7) 
where has been encoded in the kernel matrices in the minimization objective function.
2.2 Adaptive local estimation
The optimization problem in (7) is nonconvex and there is no strong prior of and to provide a good initialization. Note that in most applications, and usually smoothly vary in time. This motivates us to generate signal patches and locally parametrize the phase and amplitude functions with lowdegree polynomials in Eqn. (7). Within each patch, the nonstationary signal becomes much more stationary. As we shall see later, our numerical experiments show that degree is sufficiently good.
For each observation patch (also denoted as ), let and be the matrices consisting of the coefficients of the degree polynomials representing the amplitude and phase functions, respectively. Then we can specify the amplitude and phase functions with and by the following LS problem:
(8) 
where is the entrywise dot product, and has been absorbed in the covariance matrices.
Once the polynomial coefficients and of the amplitudes and phases in Eqn. (8) have been specified, they provide a local point estimate of the amplitude and phase functions, which can be used to obtain a global point estimate via a robust curve fitting algorithm garcia2010robust ; garcia2011fast . A standard moving average can be applied to estimate the variance of the point estimate.
3 Estimation of shape functions
In this section, we assume the phase and amplitude functions and are known and estimate the shape functions . We do not aim at closed formulas for . As introduced in Section 2, it is sufficient to estimate the pattern inducing variables and to represent .
When amplitude and phase functions are given, shape function estimation methods have been studied previously in FMMD ; xu2018recursive ; 1DSSWPT ; MMD . These methods achieves high accuracy when amplitude and phase function estimates are close to the ground truth. There is no quantitative criteria to measure how well the shape function estimate performs when the amplitude and phase function estimate is not very good. This motivates us to apply variational inference, following titsias2009variational , that explicitly expresses the distribution of the pattern inducing variables in a joint form. Please refer to Supplemental Material (SM) Section A and Section B for details of these three wellestablished approaches.
4 Overview of NOP
NOP repeatedly applies a twostage iteration until convergence. We have introduced the first stage in Section 2: given rough estimates of shapes, a nonstationary GP regression is applied to estimate amplitude and phase functions. The second stage has been introduced in Section 3: given rough estimates of amplitudes and phases, several regression methods can be adopted to estimate shapes. Hence, a complete algorithm description can be summarized in Algorithm 1 below. The respective prediction formulation of Algorithm 1 is derived in SM Section C.
In many applications, e.g. ECG and photoplethysmogram data analysis, heuristic properties of the physical system are available and we know the rough range of instantaneous frequencies
. Hence, we can apply a bandpass filter to , and then estimate and of in a certain frequency band using traditional timefrequency analysis methods Auger1995; Daubechies1996 ; YANG2017 to initialize and following the method in 1DSSWPT .Another simple initialization is to let . Since we adopt local patch analysis in Section 2, components after Fourier series expansion , , become approximately orthogonal to each other in a short time domain. Hence, the LS in (7) is able to recover the amplitude and phase functions corresponding to since they usually have the largest magnitude.
The LS problems are nonconvex and hence we cannot guarantee convergence to the global minimizer. However, the iterative scheme seems to provide very good results and the algorithm generally converges after only a few iterations. The good performance of NOP might come from the fact that once the amplitude and phase function estimates are roughly good (might not be the global minimizer of LS problems), the shape estimation step can quickly provide very good estimate of shape functions, which can be guaranteed if we adopt methods in FMMD ; xu2018recursive ; MMD . in Algorithm 1 specifies whether we update shape functions first or not; if the initialization of shape functions are better than those of amplitude and phase functions, we choose ; otherwise, we choose . The global convergence analysis would be an interesting future work.
5 Experiments
In this section, we provide numerical examples to demonstrate the performance of NOP^{3}^{3}3Code to be appear in https://github.com/daviddunson., especially in the case of superresolution and adaptive timefrequency analysis. LS problems in all examples are solved by Adam goodfellow2016deep aiming at better local minimizers. We choose degree(or degree
when specified) polynomials to approximate local amplitude and phase functions in these LS problems. The hyperparameters of NOP are set as follows: noise level
, , and . In the local patch analysis, we generate signal patches such that each patch contains approximately to periods. In the tests for superresolution, we repeat the same test with noise realizations for the purpose of using the expectation and variance of estimation error to measure the performance of different algorithms. and denote the pointwise estimation error.5.1 Superresolution spectral estimation
There has been substantial research for the superresolution problem that aims at estimating timeinvariant amplitudes and frequencies in a signal with , , and are very close. Among many possible choices, the baseline models might be MUSIC schmidt1986multiple , ME burg1972relationship ; georgiou2001spectral ; georgiou2002spectral , and ESPRIT roy1989esprit . Hence, we will compare NOP with these methods^{4}^{4}4Code from http://people.ece.umn.edu/~georgiou/files/HRTSA/SpecAn.html.
to show the advantages of NOP. Although the Fourier transform usually fails
schmidt1986multiple to identify and , we use its results as the initialization for NOP.(a) Clean signal  (b)  (c)  (d) 
Accuracy and robustness with different spectral gaps
In this experiment, we use , where the two instantiations of / are / (red lines in Fig. 3) and / as in Fig. 1 (pink lines in Fig. 3). Here and , varies from to , and different noise variance are . The sampling rate is Hz and the number of samples is in this example. Fig. 3 shows the frequency estimation accuracy of NOP, MUSIC, ESPRIT, and ME. As we can see, NOP achieves machine accuracy in the noiseless case and is much more accurate than other methods in all noisy cases. It also reaches almost the same accuracy for the trigonometric (red) and shaped(pink) instantiations in and . Baseline methods are not applicable to instantiation .
Accuracy and robustness with different sampling rates
In this experiment, we set with , , and . The sampling rate of this signal is still Hz and the numbers of samples are , , , and to generate four sets of test data. There are two different kinds of noise to generate noisy test data: 1) white Gaussian noise is directly added to ; 2) a stochastic process in
with i.i.d. uniform distribution in
is added to phase functions . Fig. 4 summarizes the results of frequency estimates in this experiment. ESPRIT and MUSIC lose accuracy in all tests. NOP and ME achieve high accuracy when the number of samples is large and NOP is slightly better than ME in terms of accuracy and estimation bias.5.2 Estimation of timevariant frequencies
In this section, we show the capacity of NOP for estimating close and crossover timevarying instantaneous frequencies. An adaptive timefrequency analysis algorithm, ConceFT Daubechies20150193 ), is used as a comparison. And local approximation degree is set to in this section.
Close frequencies and phase estimation error
We use , where the two instantiations of / are / (Fig. 5(b)) and / as in Fig. 1 (Fig. 5(c)). varies from to . The white noise has standard deviation . We apply shorttime Fourier transform griffin1984signal to identify rough estimates of instantaneous frequencies and use them as the initialization in this test. When instantaneous frequencies are very close, the initialization is very poor; however, NOP still can identify instantaneous frequencies and phases with a reasonably good accuracy. Result is summarized in Fig. 5.
Fig. 5(a) is the ground truth timefrequency representation of ten tested signals with different value of on . The difference between (green line) and (red line) are pretty difficult to be detected by existing timefrequency methods. The log error of the pointwise averaged frequency estimate is shown in the first row of of Fig. 5 (b) and (c) on different noise levels . The log error of pointwise averaged phase estimate (bottom row) is consistently small as changes. Under large noise case with , NOP controls the phase error approximate or below the level of . Existing timefrequency analysis methods usually estimate instantaneous frequencies first and then integrate them to obtain instantaneous phases, which suffers from accumulated error. However, NOP has no accumulated error.
Close and crossover frequencies
In this experiment, we generate a signal consisting of two components with close instantaneous frequencies and a signal with two crossover instantaneous frequencies. Fig. 6 visualizes the ground truth instantaneous frequencies, the timefrequency distribution by ConceFT, the initialization and the estimation results of NOP. ConceFT cannot visualize the instantaneous frequencies even if in the noiseless case. We average out the energy distribution of ConceFT to obtain the initialization of NOP. Although the initialization is very poor, NOP is still able to estimate the instantaneous frequencies with a reasonably good accuracy no matter in clean or noisy cases. However, when the number of components is known, we can average out the energy band to obtain one instantaneous frequency function and initialize all instantaneous frequencies in NOP using this function. We will show how to handle this special initialization in SM Section D.
5.3 Estimation of amplitudes, phases, and shapes simultaneously
Finally, we apply NOP to estimate amplitudes, phases, and shapes simultaneously from a single record. First, we generate a synthetic example , where , , and the shapes are visualized in Fig. 7. The sampling rate for this signal is Hz and we sample it at locations. The shape estimates are initialized as and for the first and second components, respectively (see Fig. 7 (b) and (c)). The frequency estimates are initialized as one constant centered in the peak spectrogram by ConceFT (see Fig. 7 (a)). As we can see in Fig. 7 (b) and (c), NOP is able to estimate shape functions with a reasonably good accuracy and the reconstructed components match the ground truth components very well.
In the second example, we apply NOP to a real signal from photoplethysmogram (PPG) (see Fig. 8 (b)). The shape estimates are still initialized as and for the two components, and samples are involved. The PPG signal contains two components corresponding to the health condition of the heart and lung in a human body.
6 Conclusion
This paper proposes a novel nonoscillatory pattern (NOP) learning scheme for several oscillatory data analysis problems including signal decomposition, superresolution, and signal subsampling. To the best of our knowledge, the proposed NOP is the first algorithm for these problems with fully nonstationary oscillatory data with close and crossover frequencies, and general oscillatory patterns. Numerical examples have shown the advantage of NOP over several stateoftheart algorithms and NOP is able to handle complicated examples for which existing algorithms fail. NOP could be a very useful tool for pattern analysis for oscillatory data. Although we cannot prove the global convergence of NOP, NOP seems to provide very good results in all of our tests. It is interesting to study the global convergence of NOP in the future.
References
 (1) E. Alonso, E. Aramendi, D. GonzálezOtero, U. Ayala, M. Daya, and J. K. Russell. Empirical mode decomposition for chest compression and ventilation detection in cardiac arrest. In Computing in Cardiology 2014 2014.
 (2) F. Auger and P. Flandrin. Improving the readability of timefrequency and timescale representations by the reassignment method. IEEE Transactions on signal processing, 1995.
 (3) X. Bai, M. Xing, F. Zhou, G. Lu, and Z. Bao. Imaging of micromotion targets with rotating parts based on empiricalmode decomposition. IEEE Transactions on Geoscience and Remote Sensing 2008.
 (4) J. P. Burg. The relationship between maximum entropy spectra and maximum likelihood spectra. Geophysics, 1972.
 (5) B. Cornelis, H. Yang, A. Goodfriend, N. Ocon, J. Lu, and I. Daubechies. Removal of canvas patterns in digital acquisitions of paintings. IEEE Transactions on Image Processing 2017.
 (6) Z. Dai, A. Damianou, J. González, and N. Lawrence. Variational autoencoded deep Gaussian Processes. arXiv preprint arXiv:1511.06455, 2015.
 (7) A. Damianou and N. Lawrence. Deep Gaussian Processes. In Artificial Intelligence and Statistics, 2013.
 (8) I. Daubechies and S. Maes. A nonlinear squeezing of the continuous wavelet transform based on auditory nerve models. In Wavelets in Medicine and Biology. CRC Press, 1996.
 (9) I. Daubechies, Y. G. Wang, and H.t. Wu. Conceft: concentration of frequency and time via a multitapered synchrosqueezed transform. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 2016.
 (10) D. Garcia. Robust smoothing of gridded data in one and higher dimensions with missing values. Computational Statistics & Data Analysis, 2010.
 (11) D. Garcia. A fast allinone method for automated postprocessing of PIV data. Experiments in fluids, 2011.
 (12) T. T. Georgiou. Spectral estimation via selective harmonic amplification. IEEE Transactions on Automatic Control, 2001.
 (13) T. T. Georgiou. Spectral analysis based on the state covariance: the maximum entropy spectrum and linear fractional parametrization. IEEE transactions on Automatic Control, 2002.
 (14) I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.
 (15) D. Griffin and J. Lim. Signal estimation from modified shorttime fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984.
 (16) M. H. Gruber. Statistical digital signal processing and modeling, 1997.
 (17) N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.C. Yen, C. C. Tung, and H. H. Liu. The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis. R. Soc. Lond. Proc. Ser. A Math. Phys. Eng. Sci., 1998.
 (18) W. Huang, Z. Shen, N. E. Huang, and Y. C. Fung. Engineering analysis of biological variables: An example of blood pressure over 1 day. Proc. Natl. Acad. Sci., 1998.
 (19) Y. Huanyin, G. Huadong, H. Chunming, L. Xinwu, and W. Changlin. A sar interferogram filter based on the empirical mode decomposition method. In IGARSS 2001. Scanning the Present and Resolving the Future. Proceedings. IEEE 2001 International Geoscience and Remote Sensing Symposium (Cat. No.01CH37217), volume 5, 2001.
 (20) C. E. J. and F.G. Carlos. Towards a mathematical theory of superresolution. Communications on Pure and Applied Mathematics.
 (21) W. Liao and A. Fannjiang. Music for singlesnapshot spectral estimation: Stability and superresolution. Applied and Computational Harmonic Analysis, 2016.
 (22) J. Lu, B. Wirth, and H. Yang. Combining 2D synchrosqueezed wave packet transform with optimization for crystal image analysis. Journal of the Mechanics and Physics of Solids, 2016.
 (23) Y. Pan, X. Yan, E. A. Theodorou, and B. Boots. Prediction under uncertainty in sparse spectrum Gaussian Processes with applications to filtering and control. In ICML, 2017.
 (24) G. Parra and F. Tobar. Spectral mixture kernels for multioutput Gaussian Processes, 2017.

(25)
E. Pinheiro, O. Postolache, and P. Girão.
Empirical mode decomposition and principal component analysis implementation in processing noninvasive cardiovascular signals.
Measurement, 2012. Special issue on Electrical Instruments. 
(26)
J. QuiÃoneroCandela, C. E. Rasmussen, A. R. FigueirasVidal, et al.
Sparse spectrum Gaussian Process regression.
Journal of Machine Learning Research
, 2010.  (27) C. E. Rasmussen and C. K. Williams. Gaussian Processes for machine learning, volume 1. MIT press Cambridge, 2006.
 (28) R. Roy and T. Kailath. Espritestimation of signal parameters via rotational invariance techniques. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989.
 (29) R. Schmidt. Multiple emitter location and signal parameter estimation. IEEE transactions on antennas and propagation, 1986.
 (30) E. Snelson and Z. Ghahramani. Sparse Gaussian Processes using pseudoinputs. In Advances in Neural Information Processing Systems, 2006.
 (31) G. Tang and H. Yang. A fast algorithm for multiresolution mode decomposition. arXiv:1709.06880, 2017.
 (32) J. B. Tary, R. H. Herrera, J. Han, and M. van der Baan. Spectral estimationWhat is new? What is next? Rev. Geophys. 2014.
 (33) M. K. Titsias. Variational learning of inducing variables in sparse Gaussian Processes. In International Conference on Artificial Intelligence and Statistics, 2009.
 (34) K. R. Ulrich, D. E. Carlson, K. Dzirasa, and L. Carin. GP kernels for crossspectrum analysis. In NIPS, 2015.
 (35) A. Wilson and R. Adams. Gaussian Process kernels for pattern discovery and extrapolation. In ICML, 2013.
 (36) H. Wu, Y.H. Chan, Y.T. Lin, and Y.H. Yeh. Using synchrosqueezing transform to discover breathing dynamics from ECG signals. Applied and Computational Harmonic Analysis, 2014.
 (37) Z. Wu and N. E. Huang. Ensemble empirical mode decomposition: a noiseassisted data analysis method. Advances in Adaptive Data Analysis, 2009.
 (38) Z. Wu, N. E. Huang, and X. Chen. The multidimensional ensemble empirical mode decomposition method. Adv. Adapt. Data Anal., 2009.
 (39) J. Xu, H. Yang, and I. Daubechies. Recursive diffeomorphismbased regression for shape functions. SIAM Journal on Mathematical Analysis, 2018.
 (40) H. Yang. Synchrosqueezed wave packet transforms and diffeomorphism based spectral analysis for 1D general mode decompositions. Applied and Computational Harmonic Analysis, 2015.
 (41) H. Yang. Multiresolution mode decomposition for adaptive time series analysis. arXiv:1709.06880, 2017.
 (42) H. Yang. Statistical analysis of synchrosqueezed transforms. Applied and Computational Harmonic Analysis, 2017.
 (43) H. Yang, J. Lu, W. Brown, I. Daubechies, and L. Ying. Quantitative canvas weave analysis using 2D synchrosqueezed transforms: Application of timefrequency analysis to art investigation. Signal Processing Magazine, IEEE 2015.
 (44) H. Yang, J. Lu, and L. Ying. Crystal image analysis using 2D synchrosqueezed transforms. Multiscale Modeling & Simulation, 2015.
 (45) H. Yang and L. Ying. Synchrosqueezed curvelet transform for twodimensional mode decomposition. SIAM Journal on Mathematical Analysis, 2014.
 (46) C. Zhang, Z. Li, C. Hu, S. Chen, J. Wang, and X. Zhang. An optimized ensemble local mean decomposition method for fault detection of mechanical components. Measurement Science and Technology, 2017.
 (47) B. Zhu and D. B. Dunson. Locally adaptive Bayes nonparametric regression via nested Gaussian Processes. Journal of the American Statistical Association, 2013.
Appendix A Joint pattern describing points
(a) View 1  (b) View 2 
For method , when the influence of the amplitude function is negligible, another natural approach is to apply the variational inference, following titsias2009variational . We can explicitly express the distribution of the pattern inducing points in a joint form , where , and with . The th row of is denoted as , and the respective th entry of the joint is expected to approximate . One exemplified is shown in Fig. 9. The goal is to maximize a lower bound of ( is fixed and thus dropped for now) given as
(9)  
(10) 
To reduce the unnecessary computational complexity, the periodic kernel takes the form of
(11) 
By applying an inverse Jensen equality on Eqn. (10), we obtain the variational maximizer of the lower bound as
(12) 
where . Then we can further get and . The mean field approximation and a dimensional ANOVA is applied to retrieve , for , in closed forms from the joint estimation of . The separated estimates of can be approximated for the respective shape functions . Computational details are listed in the following section.
Appendix B Update pattern inducing points
We start from under the situation that the influence of the amplitude function is very weak and a point estimate of the latent input is given.
One natural approach is to apply the variational inference following titsias2009variational , which explicitly expresses the distribution of inducing points by maximizing a lower bound of ( is dropped in the rest part of this section) :
(13)  
(14)  
(15)  
(16) 
which is
(17) 
By applying the inverse Jensen Inequality on the first term, the variational maximizer of the lower bound is derived as
(18)  
(19) 
In the above equations, , and . The mean field approximation is applied to . Then, we optimize and the diagonal version of , denoted as by
(20)  
(21) 
The above optimization problem is essentially a dimensional ANOVA and can be solved in closed form.
The separated estimates of serves as an approximation for the respective shape functions .
Method can be implemented by a nonlinear band truncation on the respective timefrequency domain obtained by the ConceFT, STFT, etc. Fig. 10 shows an example of such nonlinear truncation. As can be seen, after this adaptive frequency filtering, only respective low frequency part of each mode is kept. is transformed from the original to , or . denotes for unknown initial value of the phase function.
(a) ConceFT  (b) Truncated ConceFT 
The general idea of the approaches in Method is to apply a nonlinear diffeomorphism on the signal according to the updated latent phase function . Then apply adaptive regression to further guarantee certain properties of the underlying pattern. We refer the readers to xu2018recursive ; MMD ; FMMD .
Appendix C Prediction
In this part, we introduce how to make predictions on the signals or to generate new samples by NOP with updated output , , and or . Since the amplitude function and the frequency function are assumed to be smooth, we can put a GP prior on both of them and treat the updated , as respective learning points. The updated patterning inducing points are directly plugged in for inference.
If we want to predict on new time points , the respective predicted value of can be obtained as following. The first sample of with respect to is drawn from . of from is drawn by the following equation
(22) 
and
(23) 
where, , and .
Since
(24) 
where , and , thus
(25) 
Since different components are independent, we have
(26)  
(27) 
Therefore, can be drawn from distribution (27).
Further, if is approximated by the mean field method, can be simply integrated out from distribution (27).
Appendix D Nondistinguishable Problem
In this section, we introduce an ad hoc approach to tackle the nondistinguishable problem arises when the initialized instantaneous frequency of different components are set to a same function. For the estimated frequencies and , where , we want to distinguish components with instantaneous frequency or for each time point .
The naive method is to sort and firstly. Then we cluster components by their values. This actually works pretty well when no crossover happened for and , which is shown in Fig. 11 (a). However, when the crossover happens, as with the second row in Fig. 7, this simple clustering method will fail, as shown in Fig. 11 (b).
One simple way to tackle this situation is stated as following, based on the smoothness assumption placed on the frequency functions . First we identify all the possible crossover points of , which is the robust local minimum of . Then the initial clustering obtained from the simple sorting, as shown in Fig. 11 (a)(b), is switched whenever passes a potential crossover point, in the order of increasing time. This simple reorder trick will output the new clustering as shown in Fig. 11 (c), which is satisfactory for a further refinement to get a point estimate of .
(a)  (b)  (c) 
Appendix E Signals Plots
This section gives the plots of signals used in the main text.