The localization of patterns in a time series is a common data mining problem, with applications in a multitude of fields, including biomedicine, finance, industrial engineering and speech recognition [fu2011review, esling2012time, ratanamahatana2004everything, liu2005locating]. Contrary to the problem of detection, in which a decision is to be made about the presence or absence of a pattern, the problem of localization assumes that the pattern is present and its precise location is to be retrieved. The temporal nature of the data acquisition process complicates these tasks, as it causes the shape of the patterns of interest to suffer deformations in time known as warps. For pattern detection problems, many techniques exist based on aligning the query time series to a known reference pattern, commonly through dynamic time warping (DTW) [berndt1994using]. Similarly, a common pattern localization technique consists in aligning the query time series to a reference time series that contains several patterns of interest [ratanamahatana2004everything].
In many scenarios, however, the sequence of patterns to detect is not available as a reference time series but in the form of a blueprint that lists the theoretical locations of the expected patterns. In this case, the localization problem translates to aligning the time series to a model (the blueprint). A conceptual overview of this problem, which we refer to as “signal-to-model” (S2M) alignment, is given in Fig. 1.
The most prominent example of the S2M alignment problem in the scientific literature is “audio-to-score” alignment [thickstun2017learning] in the context of music information retrieval. There, a musical recording is to be aligned to the corresponding score. S2M alignment is encountered in several other contexts as well, such as industrial environments, where a structure is measured in a one-dimensional fashion and the measured time series is to be aligned to a blueprint for this structure. This is the scenario illustrated in Fig. 1, and a specific application in non-destructive testing will be discussed in detail in the experiments of Section 4.
The common approach to tackling the S2M alignment problem consists in first synthesizing a time series based on the available model, and then aligning the true time series to this artificial time series through standard alignment techniques for localization [thickstun2017learning]. The main bottleneck in this approach is the availability of an accurate synthesis technique. In some contexts, such as music, techniques for synthesis are available that rely on pre-recorded waveforms and domain-specific knowledge. In other contexts, the design of an appropriate synthesis technique often represents a complex problem. An additional difficulty is due to variations in patterns that belong to the same class, illustrated by the pattern “G” in Fig. 1. Even if an appropriate synthesis technique were available, the alignment would be hindered by these differences in the true waveforms.
In this paper, we propose a localization framework based on machine learning that increases the similarity between the synthesized and the true time series by mapping both into a latent correlation
space. The mapping is learned from a set of training signals, and by assigning a high degree of freedom to this mapping, the choice of an appropriate synthesis technique becomes less critical. We apply the proposed method to time series data acquired in non-destructive testing, obtaining a significant improvement over the state of the art.
2 Signal-to-Model Alignment
2.1 Problem definition
We are given a time series and its corresponding theoretical model , consisting of a sequence of time markers . Here, and mark the start and end locations of the -th event of interest, respectively, and indicates the class of the event. The start and end locations can be expressed in any unit that represents a one-dimensional quantity, for instance millimeters or seconds. The localization problem can be solved by aligning the time series to the model , such that for each of the markers and the corresponding locations (expressed in samples) in the time series are retrieved. We will denote the alignment solution as .
2.2 Standard approach
The standard approach to solving S2M alignment consists in transforming it into a setting of time-series alignment. This, however, requires two time series, while the described scenario contains only one (in addition to the model). The second time series, , is synthesized from the information contained in the model. We briefly outline three synthesis strategies:
Binary synthesis generates a time series whose values are in the range of each event and outside. This is a rudimentary synthesis that may serve if the true patterns resemble rectangular blocks.
Replication synthesis consists in replicating a true pattern at every position indicated by the model. This type of synthesis requires the availability of a training signal from which patterns can be extracted.
Generative synthesis encompasses advanced forms of synthesis that use domain-specific models to generate realistic time series. In music-alignment problems this type of synthesis is most often used [thickstun2017learning].
Once the synthesized time series is available, it is aligned to the true time series , typically through dynamic time warping [rabiner1993fundamentals, vintsyuk1968speech]. DTW is a well-known technique and the de-facto standard algorithm for aligning time series. By relying on dynamic programming, DTW allows to evaluate a combinatorially large number of warps, each of which consists of local shifts, contractions, and stretches of the signals. Figure 2 illustrates an alignment solution obtained by DTW, and its corresponding warping path. The alignment solution, denoted as , contains the warping path, which consists of the pairs of samples from and that are aligned.
The solution allows to map any location in the model (across ) to a sample index in the true time series , and vice-versa, hence solving the localization problem.
As mentioned in the introduction, the main bottleneck in the standard approach lies in the availability of an accurate synthesis technique. In problems with little domain-specific knowledge, for instance, only basic forms of synthesis can be applied, which may not guarantee sufficient similarity between the synthesized and the true time series. Furthermore, even if a realistic synthesis procedure were available, the patterns in the true time series may show variations that hinder the final alignment.
In order to mitigate the differences between the synthesized and the true time series, an additional transformation is sometimes performed on both. For music, a chromagram can be used, which is a condensed form of the spectral information, representing notes [thickstun2017learning]. Nevertheless, such transformations are not guaranteed to be optimal, and in essence they require the availability of domain-specific information.
3 Proposed technique
We propose to automatically learn a transformation that maximizes the similarity between the synthesized and the true time series. This transformation can be interpreted as a mapping to a latent space where both time series are more similar. The goal of the mapping, therefore, is to emphasize the common components in both time series, and to suppress noise signals and artifacts present in only one of them.
The learning process is illustrated in the diagram of Fig. 3, left plot. In order to learn the mappings, a training time series is required, together with a model that has been aligned to , for instance by a human labeler. A synthetic time series is first obtained from the aligned model, which now represents a time series that is perfectly aligned to .
Next, the optimal transformations need to be learned. The transformations used in this work are linear filters, which are implemented as projections of time-embedded vectors. The transformationsin the diagram, therefore, correspond to time embeddings in this case. In order to obtain the optimal mappings, the technique of canonical correlation analysis is used, which we discuss in the sequel.
3.1 Learning the latent correlation space
Given two multidimensional random variablesand
, canonical correlation analysis (CCA) seeks a pair of optimal linear transformations such that the transformed variables are maximally correlated[hotelling1936relations, hardoon2004canonical]. In the present context, and represent samples from the time series that have been time-embedded with an embedding of size . Denote by and the respective projection vectors, and by and the samples from the obtained transformed time series in the latent space, and . If we represent the covariance matrix between and by , and equivalently the autocovariance matrices and , the function to be maximized is
This problem can be solved as a generalized eigenvalue problem, which yields the optimal projectorsand , see [hardoon2004canonical] for additional details. Note that the training process is not limited to a single time series, as the covariance matrices can be easily constructed for an entire set of training data.
3.2 Testing on new time series
When a new time series is considered, alignment can now be performed in a straightforward fashion as illustrated by the diagram of Fig. 3, right plot. Apart from the time series, the proposed technique requires the corresponding model as an input, which is not aligned yet at this point. Similarly to the training process, the model is synthesized into a time series, and a fixed transformation (the embedding) is applied to both time series. Then, the mapping to the latent space is performed, by projecting the true time series with and the synthesized series with . The obtained time series in the latent space, and , now show a high similarity and can be aligned by applying DTW. The alignment solution is finally used to produce a model that has been aligned to the true time series (as shown in the diagram), or, equivalently, an alignment of the true time series to the original model.
The entire transformation of the time series consists in the fixed transformation followed by the projection to the latent space. If represents a large time embedding, , the projections possess a large number of degrees of freedom to define an optimal one-dimensional latent space. In some preliminary experiments, we have verified that this flexibility may compensate for poor forms of synthesis, such as the binary synthesis mentioned earlier. As a result, the proposed machine-learning based technique represents a framework that can be easily applied to S2M alignment problems in which domain-specific knowledge is scarce.
4 Numerical Experiments
We apply the proposed technique to time series acquired in non-destructive testing of heat generator tubes [garcia2011non]. The data consists of time series acquired through eddy current testing, and for each time series a corresponding blueprint is available that indicates the theoretical locations of support structures, similar to the model represented in Fig. 1. Each support structure leaves a characteristic pattern in the measured time series. Due to space restrictions, we will only consider a simplified scenario with one class of support patterns.
First, we study the basic alignment problem, in which one training time series is used to extract a pattern for replication synthesis, and a second time series is used for testing the localization technique. To both time series we artificially add a common background noise known as “pilgrim noise” [benoist1991expert], and we further add a small temporal warp. The synthesized and the test time series are shown in Fig. 4, first two plots.
As a benchmark, we apply the DTW algorithm to align both series directly. The third plot of Fig. 4 shows the synthesized time series after applying the found time warp. Clearly, the warping path does not correspond to a valid alignment. For instance, DTW determined that there is a pattern that includes sample while there is none, which causes the localization of all consecutive patterns to shift. On average, the error between the true markers provided by a human labeler and markers estimated through time warping of the model amounts to samples.
Next, we apply the proposed method with an embedding of samples from the past and from the future, i.e. . Both time series are first mapped to a latent space that is learned from the training time series, after which standard DTW is applied. The last plot of Fig. 4 shows the projections, in the latent space, of the true series and the aligned synthesized series. The estimated markers, shown below together with the true markers, have an average error of samples.
In the second experiment, we test the alignment techniques for different degrees of pilgrim noise, ranging from (no noise) to (noise of the same amplitude as the maximum peak in the signal). A single pattern was used for creating the synthesis time series, and for learning the latent space a set of training time series was used. The remaining time series were used for testing. The optimal embedding for each amount of noise was determined by cross-validation on the training data.
Fig. 5 shows the average localization error for DTW and for the proposed CCA+DTW technique. For low rates of noise, both techniques perform similarly. Starting from a noise rate of , however, the benefit of operating in the learned latent space becomes clear.
5 Conclusions and discussion
We have presented a novel technique for locating patterns in time series for which a blueprint model is available. Similar to other methods in the literature, the proposed method operates by synthesizing the model into a time series, in order to perform dynamic time warping and aligning both series. In addition, though, the proposed technique maximizes the similarity between both time series, before performing DTW, by mapping them to a latent space where the time series are maximally correlated. This optimal mapping is learned through canonical correlation analysis. Experiments with eddy current testing data show that the proposed technique is capable of correctly locating patterns in time series in challenging scenarios with high degrees of background noise.
The design of the method does not rely on any domain-specific knowledge, and is therefore expected to operate satisfactorily in a wide range of applications. In future research, we plan to apply the proposed method among others to the audio-to-score alignment problem. Due to space restrictions, we have limited the experiments in this paper to a simplified setup that highlights the strength of the proposed method. Nevertheless, preliminary experiments with more complex setups, including patterns of several different classes, multidimensional time series, and additional noise types, have been successful and will also be the topic of future research.
The authors thank Tecnatom S.A., Madrid, Spain, for providing the data used in the experiments.