## 1 Introduction and background

We begin with the following general question: can one define equivalence relations between time series, whereby two time series are declared equivalent if they differ only up to noise? This is of course a heuristic; in this paper we give one concrete way to understand this. We prove the following theorem, which combines

2.4 and 2.5:###### Theorem 1.1.

There exist equivalence relations on the space of time series, obtained from algorithmic determination of change points, such that the resulting quotient spaces are metrizable. Such metrics are continuous with respect to perturbation of the time series, (lemma 2.3).

The proof and construction of such equivalence relations builds on several insights: the change point algorithms developed by (Hawkins, 1977; Hawkins et al., 2003) and (Ross, 2014), the distances between change points of (James et al., 2019) and the construction of Lebesgue spaces as metrizable quotients by (Riesz, 1910).

The most significant equivalence relation between functions in measure theory is that of almost everywhere equivalence: define if a.e., that is, is measure zero. For both the motivation and mathematics of our theorem and construction, we rely on this crucial equivalence relation.

Recall the construction of the spaces. Let

be the vector space of all Lebesgue-measurable functions

on such that has finite integral. We have a quotient mapping, defined and outlined in (Billingsley, 1995),We will use this notation in what follows. One then equips this quotient space with the norms, see (Billingsley, 1995) for these definitions or section 2 for our norm.

Introduced by (Hawkins, 1977) and developed further by (Hawkins et al., 2003; Hawkins and Zamba, 2005), change point algorithms are an evolving field of algorithms intended to determine structural breaks at which a change in the statistical properties of a time series has occurred. Notable recent advancements have been made by (Ross, 2014); we shall use Ross’ CPM package (Ross, 2015) for our experiments.

Metric learning and the application of different distances has been of great interest to the machine learning community across a variety of problems for years, (Thorpe et al., 2017; Memoli, 2011; Arjovsky et al., 2017; Xing et al., 2002; Zhao et al., 2005). More recently, (James et al., 2019)

used change points to define and calculate distances between time series. Specifically, they apply semi-metrics, which they argue are more robust to outliers than the more widely used Hausdorff metric

(Aspert et al., 2002), between sets of change points in a collection of time series. Their methodology has two downsides: these semi-metrics do not obey the triangle inequality, and their analysis does not record any of the time series’ attributes between these structural breaks.Building on this, we develop new procedures for measuring distances between time series based on structural breaks, simultaneously ameliorating these downsides and proving theorem 1.1

in the process. We associate to a time series a piecewise constant function that contains the data of the change points as well as the mean, variance, or other desired property, and embed these piecewise constant functions within

. This procedure is highly flexible, building upon any available change point algorithm and recording any desired statistical property.## 2 Proof of theorems and construction

Let be a real-valued time series over some time interval , and the space of such time series. In this paper, we only record each data series once, so treat each as a single observed real number. Let be the set of finite subsets of .

First, select a statistical attribute, such as mean, variance, or higher-order moment. For simplicity of exposition, we describe the procedure for the mean. Use the two-step algorithm of

(Ross, 2015) to determine a set of change pointswith respect to changes in the mean. This is an inductive deterministic procedure that uses iterated hypothesis testing. At each step, the null hypothesis of no change point existing is tested, and subject to preset parameters, the null hypothesis may be ruled invalid; this determines a change point. In addition, set

. Properly interpreted, this gives a functionDifferent parameters yield different algorithms and hence different such functions. This procedure has algorithmically determined that has a constant mean over each interval . By simply averaging over these intervals, compute and record the empirically observed means . Now define a function

as a weighted indicator function. This is a piecewise constant function that codifies the change points of the time series and its changing means. That is,

Let be the image of under the almost everywhere equivalence relation. This step means that the values of at the change points do not matter, for a finite set of points is measure zero. Let be the subspace of consisting of (the images of) piecewise constant functions. Properly interpreted, our procedure is the function

This defines an equivalence relation on and an embedding

Note that is a vector subspace of and thus inherits its norm , defined by

normalized so that . When , there is an inner product associated to the norm:

At this point we must make some careful definitions and observations. While has a vector space structure, the change point algorithms do not respect addition of time series, and hence is not linear. The absence of linearity forces us to differentiate two separate definitions. First, the metric space structure on induces a metric on and a pseudo-metric on itself. Concretely, if map to piecewise constant functions respectively, define .

Secondly and separately, we can pullback the norm structure to as a measure of the overall magnitude of the time series, defining . We use the symbol mag, rather than , as this is neither a norm nor semi-norm on . Concretely, but That is, mag does not induce the metric Only after passage to , via the non-linear , does the norm induce the metric.

Now we prove that has all properties of a metric on .

(i) It is immediate that always.

(ii) It is immediate that .

(iii) The triangle inequality follows from lemma 2.1.

(iv) If , then are equal elements of by lemma 2.2.

###### Lemma 2.1.

If then , establishing the triangle inequality for .

###### Proof.

Passing to it suffices to prove for any elements of , This is known as Minkowski’s inequality, (Minkowski, 1953). First, the concavity of implies Young’s inequality: if then

Next, one proves Hölder’s inequality:

To prove this, without loss of generality, normalize and so that both It remains to prove Young’s inequality gives, for ,

Integrate this, to get as required. Finally,

The first inequality is by the triangle inequality, the second is by Hölder. The final equality holds as Divide by , supposing it is non-zero, to get the result. If is zero, the result is trivial. ∎

###### Lemma 2.2.

Given time series , the following are equivalent:

(i)

(ii)

(iii) have the same set of change points and the same values of means between them.

See Figure 1 for a depiction of equivalent time series.

###### Proof.

Let respectively. (iii) states that . By definition of , (ii) states that , while (i) states that . Clearly then, (iii) implies (ii), and (i) and (ii) are equivalent since is a norm on Finally, suppose (i) holds. Then is itself a piecewise linear function with norm zero. If have breaks and respectively then is continuous on . With the integral of being zero over this region, continuity means is identically zero on . As such, all the open intervals and must coincide, with the exact same coefficients. This proves (iii). ∎

###### Lemma 2.3.

Let be time series, and suppose is compared with a perturbation Suppose differs from on an interval such that the change point algorithm detects two additional change points at and a mean that differs by from the mean of . Then

That is, and are continuous with respect to small deformations in the time series. See Figure 1 for an illustration of the suppositions of the lemma.

###### Proof.

If then

Note if then this establishes continuity relative to the length of the deformation interval, in addition to the deformation magnitude. ∎

Combining all these results we have proved

###### Theorem 2.4 (Mean theorem).

The space is equipped with a class of functions that induce metrizable quotient spaces . The equivalence can be characterized simply as if have identical sets of change points and means in their stationary periods. The metrics are stable under perturbations in the time series. We may also equip with a magnitude function, representing a measure of its mean up to noise.

By an identical method, using change point algorithms for changes in variance, one can construct deterministic functions that prove

###### Theorem 2.5 (Variance theorem).

is equipped with a class of functions that induce metrizable quotient spaces . The equivalence can be characterized as if have identical sets of change points and variances in their stationary periods. The metrics are stable under perturbations in the time series. We may equip with a magnitude function, representing a measure of its total variance up to noise.

Other statistical properties induce other equivalence relations and maps , all producing variants of theorem 1.1.

## 3 Analysis of collections of time series

In this section, we describe a general procedure to use the function and the induced distances to compute and analyse distances in a collection of time series. Suppose are time series over some time period . We suppress the time in our notation.

First, choose an appropriate change point algorithm for the data, including threshold parameters, and a desired statistical quantity, such as mean or variance. This specifies a precise instance of . Let the piecewise constant functions associated to the time series be , so that . Also, consider the normalized piecewise constant functions . Form three different matrices:

###### Remark 3.1.

Recall . Thus, the normalized are produced by dividing by the aforementioned time series’ magnitude functions.

###### Remark 3.2.

We note a relation between and for . If are two non-zero elements of an inner product space, with normalized elements then

###### Definition 3.3.

Call an matrix a *distance matrix* if is symmetric, for all and . Call an matrix an *affinity matrix* if is symmetric,

###### Remark 3.4.

Given a distance matrix one may produce an affinity matrix by:

is a distance matrix, thus named as it consists of unscaled distances. is also a distance matrix, consisting of distances between normalized vectors, with suppressed from the notation. We refer to as an *alignment matrix* as it measures angles between the as vectors in an inner product space. Note that all entries of lie in with all diagonal elements equal to To and associate affinity matrices and .

### 3.1 Cross-contextual analysis

In many applications, a collection of time series has additional structure. Each individual series contains numerical data one can compare with each other, but there may also exist contextual data among the indices . For instance, in our specific analysis to follow, is a time series of measurements taken at a particular measuring station; the relationships between these stations, such as their physical distance, influences the data. It is of interest to analyse not just the collection of time series, but their relationship with physical distance of these measuring stations, or other cross-contextual data. We introduce a general framework for cross-contextual distance analysis. Let be a distance matrix that codifies relationships between the sources of the data, As above, we can associate an affinity matrix to these distances. Then, define *consistency matrices*

These matrices measure the consistency between the affinity measured by our measures on time series, and a cross-contextual distance matrix .

### 3.2 Clustering methodology

With the definitions of section 3.1

in mind, we may apply the methods of spectral and hierarchical clustering in two ways. First, as is standard, we may apply it directly to the matrices

, to determine anomalies and clusters of similarity with respect to distance between time series. Because the entries of are obtained as a linear combination of the entries of , clustering with respect to a distance matrix or its associated affinity matrix gives identical results. The same applies for and , by remark 3.2: these will cluster in identical ways.In addition, we may also apply clustering techniques to the consistency matrices defined above. This will identify those anomalous data sources, indexed by , which do not behave similarly between the two distance measures. In section 4 we will uncover striking similarity between distances between air quality time series, and geographical distances, and will identify just a handful of anomalous locations where this relationship does not hold, namely anomalies in the consistency matrix.

Finally, we also compute the matrix norms of the consistency matrices as a measure of the overall consistency between the distance measures. We use the norm

### 3.3 Synthetic experiments

In this section, we demonstrate the superior performance of our metrics over previously proposed such distances between time series based on structural breaks. (James et al., 2019) use metrics and semi-metrics between the sets of change points without using data of the mean or variance between these breaks. Our synthetic time series provide examples where two different structural breaks may have markedly different changes in the mean.

Consider the 10 time series displayed in Figure 2. Black vertical lines indicate detected change points, with the orange line representing the piecewise constant functions . When considering only structural breaks, the most similar time series are {1,2,3}, {4,5} {6,7}, {8,9,10}. However, considering the structural breaks and the mean, the most similar time series are {1,3} and {7,8}. Figure 3 demonstrates that existing measures

the superior performance of the metric, here with , over previous measures in (James et al., 2019). The Hausdorff metric , modified Hausdorff semi-metric and their modified semi-metric, , all identify erroneous similarities based solely on the structural breaks of these time series. The metric captures similarity more appropriately, identifying only time series {1,3} and {7,8} as highly similar. This is clear from the hierarchical clustering over the affinity matrix in Figure 2(d).

###### Proposition 3.5.

###### Proof.

Adopting the notation of lemma 2.3, let have structural breaks and respectively. Relative to , has two additional structural breaks . Measuring distance between time series purely relative to structural breaks, yields between and between . As such, is constant and non-zero relative to . Hence it does not approach zero as . The same holds for .

Moreover, these metrics are not continuous relative to either. Consider the simple case where . Assume . Then . This is also constant relative to and does not approach zero as . A similar argument holds for . Hence the result. ∎

## 4 Australian bushfire data

The 2019-2020 bushfire season in New South Wales, Australia, has attracted international attention. Apart from the immediate damage, the bushfires have subjected the entire state to poor, and often hazardous, air quality. Measured hourly at 52 stations across NSW, these figures measure a significant consequence affecting people across the state, and constitute a trove of data for analysing where the fires worst affected their surroundings.

Mathematically, this data is a suitable opportunity to apply our metrics derived from

distances between piecewise constant functions generated from change point algorithms. The change point procedure helps stabilize the data more appropriately, measuring average air quality over intervals of days rather than the huge peaks seen in the raw air quality data. Since we expect air quality to remain somewhat locally stationary, these change point algorithms are more appropriate than more flexible non-parametric models.

We access AQI (air quality index) data from the New South Wales Department of Planning, Industry and Environment website https://www.dpie.nsw.gov.au/air-quality/current-air-quality. In the rare case of missing data points, we take the value from the most recent prior day. We measure hourly data over a three month window, 20 October 2019 - 20 January 2020. This gives us 52 time series of length 2211. Latitudes and longitudes of measuring stations are taken from the same website, then converted into radian measurements, from which we compute the matrix of geographical distances using the Haversine formula, (van Brummelen, 2013).

Within this multivariate context, we perform two separate analyses. First, our new distance matrices, computed according to the algorithmic framework of section 2, enables the identification of similarity clusters and anomalies. Secondly, calculating the geographical distances between measuring sites gives a contextual distance matrix

, with respect to which we can form associated consistency matrices. With each of these matrices, we apply hierarchical and spectral clustering, thereby identifying anomalies in the spread of poor air relative to space and time.

Henceforth,

### 4.1 Raw AQI data

We perform our algorithm on the AQI time series recorded by various measuring stations across New South Wales. Our results are consistent with geographical proximity and provide several insights regarding anomalous air quality relative to space and time during our period of analysis. A large number of these measurement sites are located in the city of Sydney, and hence close relative to the rest of the state of NSW. For instance, Rozelle and Randwick are two sites within Sydney that are particularly geographically close geographically, km apart. Figures 3(a) and 3(b) illustrate the striking similarity between these air quality time series. The other four sites will be relevant in subsequent analysis. In Figure 3(c), note that Narrabri air quality was consistently poor from early November 2019. Katoomba, Wagga Wagga and Albury, displayed in Figures 3(d), 3(e) and 3(f), all suffered sharp peaks in poor air quality much later than Narrabri. Albury in particular exhibits peaks in poor air quality as late as mid January 2020. Indeed, Albury, particularly south in NSW, was affected by the fires much later, as they moved south.

### 4.2 Affinity matrix analysis

In this section, we use our new distance measures, via the function , to produce clusters of similarity and anomalies with respect to the air quality of different locations. As remarked in section 3.2, there is no difference between clustering on a distance matrix or its affinity matrix, so we cluster based on affinity matrices and . These entries lie in so that diagrams display a consistent scale. These matrices each provide interesting insights, and are largely consistent with respect to their identification of similarity clusters and anomalies.

Figure 4(a) displays , our affinity matrix between unscaled distances. Hierarchical clustering indicates that there are two clusters of time series. One cluster contains Katoomba and Albury, both of which had marked peaks in poor air quality, confirmed by Figures 3(d) and 3(f)

. Both areas experienced some of their worst air quality well after other NSW measuring stations, leading to their piecewise constant function being classified as anomalous. The second cluster contains two sub-groups. The first of which is a large sub-cluster of highly similar time series, which contains all Sydney measuring stations. The second sub-cluster consists of less pronounced anomalies, containing Armidale, Bargo, Bathurst, Oakdale, Orange, Tamworth, Wagga Wagga, and Port Macquarie. All of these locations are a significant distance from the city of Sydney. Spectral clustering results suggest that there are three clusters; one containing Albury, another containing Katoomba, and the third consisting of the remainder of the time series.

Figure 4(b) displays , our affinity matrix recording distances between normalized vectors. We have similar results for this matrix. Hierarchical clustering indicates that there are two clusters of time series, one of which has two sub-groups. The first cluster consists of two highly anomalous time series in Port Macquarie and Albury. Within the second cluster, the first sub-group is less similar to the majority of time series and consists of: Armidale, Bargo, Bathurst, Oakdale, Orange, Tamworth, Wagga Wagga, and Katoomba. In particular, note that after normalization, Katoomba has left the most anomalous cluster. For Katoomba is characterized by the single greatest magnitude of all locations, with . After normalization by the magnitude, recalling remark 3.1, this extreme feature is nullified. The remainder of the measuring station time series, which again contain all of Sydney, are identified as highly similar. Spectral clustering on the matrix indicates that there are two highly anomalous time series, Port Macquarie and Albury.

Third, we analyse the alignment matrix . Hierarchical clustering is displayed in Figure 4(c). These results are almost identical to the hierarchical clustering on , identifying Port Macquarie and Albury as primary anomalies, and Armidale, Bargo, Bathurst, Oakdale, Orange, Tamworth, Wagga Wagga, and Katoomba as secondary anomalies. The remaining stations, once again containing the entire city of Sydney, are closely clustered. Spectral clustering results propose three clusters of AQI: one cluster containing Port Macquarie, one containing Albury, and one remaining cluster consisting of the remaining locations.

### 4.3 Cross-contextual analysis

In this section, we analyse the consistency matrices defined in section 3. These consistency matrices provide a framework for cross-contextual analysis. That is, where one matrix computes the similarity in one aspect, and another matrix computes similarity in a separate aspect, subtracting one affinity matrix from the other allows us to identify the consistency between different similarity measures across a collection of time series. We can place measurements between different time series within a greater context. In this instance, we analyse whether air quality is consistent with respect to geographical distance between measuring locations. Anomalies in consistency highlight either a) two areas close by way of geographical distance but dissimilar in air quality or; b) two locations far apart in terms of geographical distance, but similar in their air quality.

Hierarchical clustering on the geographical affinity matrix determines clusters of measuring stations in terms of their geographical proximity within the collection. Two clusters are proposed. The first consists of Wagga Wagga and Albury, both of which are significantly further south than the other measuring stations. The remaining measuring stations fall in one cluster that appears to have two distinct collections of similarity. The first sub-group contains the city of Sydney sites, while the second sub-group includes locations outside Sydney.

First, we analyse the consistency matrix associated to the unscaled distance measure, . Hierarchical clustering indicates that there are two clusters. The principal anomaly is Katoomba, which sits in its own cluster. Within the second cluster there is a small sub-cluster of anomalous locations including Gunnedah, Narrabri, Albury, Wagga Wagga and others. That is, Katoomba is highly anomalous relative to the relationship between air quality and distance from other locations. Indeed, Katoomba is relatively close to the city of Sydney but was befallen with terrible bushfires close by. Spectral clustering indicates that there are three clusters: the first consisting solely of Katoomba, the second consisting solely of Oakdale and the final cluster containing the remaining measuring locations. The norm of the entire consistency matrix is indicating broad similarity between geographical and AQI affinity.

Next, consider the consistency matrix associated to the distances between normalized vectors, . The hierarchical clustering results are highly similar to that of . Again, two clusters are suggested, one consisting solely of Katoomba and the one containing the remainder of the measuring stations. Interestingly, there is more variance in the structure of the second cluster. This demonstrates the utility of our plurality of measurements: others may uncover structure that one alone cannot. Katoomba’s presence again provides another insight: even after normalizing by its extreme magnitude, it is still anomalous relative to the consistency between geographical distance and AQI. Spectral clustering indicates that there are two anomalous locations, Katoomba and Bargo, each of which is contained exclusively within its own cluster. Finally, the matrix norm is , larger than that of . This makes sense given less consistency overall in this matrix.

Finally, consider the consistency matrix associated to the alignment matrix, . Hierarchical clustering indicates that there are three clusters of locations. The first consists solely of Narrabri, the second cluster consists of Bargo and Katoomba, and the final cluster consists of the remaining measurement stations. These can be sub-categorised into those located closer to Sydney and those located farther from Sydney. Spectral clustering indicates that the two most anomalous locations are Katoomba and Bargo. The absence of Narrabri as an anomaly should be noted. Given the reasonable difference in inference generated from each consistency matrix, one can appreciate the importance of using the most appropriate distance measure. The matrix norm is , once again indicating broad similarity.

Returning to previous matrices, we notice that Narrabri is a truly interesting case. It was never identified as a significant anomaly with respect to geography or AQI, considering distances or alignment. And yet, it is anomalous in the consistency between geographical and AQI distance. It is one of very few locations where these two measures are not closely aligned relative to other sites.

The holistic analysis of various distance and consistency matrices yields several insights. First, there is broad consistency between AQI and geographical affinity matrices, which measure the relationship between sites with respect to these two aspects. Second, there is clearly sensitivity in the anomalies detected based on the distance measures that are used. Third, our experiments indicate that when analysing different distance and consistency matrices with various clustering methods, spectral clustering appears to provide more consistent results than hierarchical clustering - regardless of the matrix analysed.

## 5 Conclusion

We have proposed a new theoretical and practical framework for understanding equivalence and computing distances between time series. Mathematically, we aim to identify and quotient by the relation that two time series differ only up to noise, and computationally, we want to remove noise from calculations of distances and magnitudes . Our procedure builds on earlier work to measure distances between time series via sets of structural breaks, and our new method produces more reliable and continuous measures. We have applied the function and these metrics to analyse data with extraordinary peaks, determining piecewise constant functions designed to understand the true movement of the mean throughout the time series. This procedure is highly flexible: we could swap variance for mean, vary our change point algorithm parameters, or use completely different change point algorithms altogether. Analysing the NSW bushfires, we have determined anomalies not just in the air quality data itself, but also those sites that stand out in the otherwise strong consistency between air quality and geographical distances.

Several future implementation opportunities exist in this equivalence framework. For instance, two autoregressive progresses and should be declared equivalent, if and

are each “white noise” variables, namely

Gaussian distributions for sufficiently small . Algorithms could be developed to empirically detect such similarity in the distributions from an observed sample set, define equivalence relations, and hence produce reliable distance measures on the induced quotient space. Developing further theoretical and algorithmic ways of determining quotient spaces equipped with metrics may have many applications in determining appropriately adjusted measures between time series. Within a machine learning context, algorithmic frameworks designed to learn appropriate notions of equivalence would also be of great interest to the community.## References

- Wasserstein GAN. Proceedings of the 34th International Conference on Machine Learning 70, pp. 214–223. Cited by: §1.
- Mesh: measuring errors between surfaces using the Hausdorff distance. Proc. IEEE Int. Conf. Mulim. Expo. Lausanne, pp. 705–708. Cited by: §1.
- Probability and measure. New York: John Wiley Sons. Cited by: §1.
- A change-point model for a shift in variance. Journal of Quality Technology 42, pp. 355–366. Cited by: §1, §1.
- Statistical process control for shifts in mean or variance using a changepoint formulation. Technometrics 47, pp. 164–173. Cited by: §1.
- Testing a sequence of observations for a shift in location. Journal of the American Statistical Association 72, pp. 180–186. Cited by: §1, §1.
- Novel semi-metrics for multivariate change point analysis and anomaly detection. arXiv e-prints, pp. arXiv:1911.00995. External Links: 1911.00995 Cited by: §1, §1, §3.3, §3.3.
- Gromov Wasserstein distances and the metric approach to object matching. Foundations of Computational Mathematics, pp. 1–71. Cited by: §1.
- Geometrie der zahlen. Chelsea. Cited by: §2.
- Untersuchungen über systeme integrierbarer funktionen. Mathematische Annalen 69 (4), pp. 449–497. Cited by: §1.
- Sequential change detection in the presence of unknown parameters. Statistics and Computing 24, pp. 1017–1030. Cited by: §1, §1.
- Parametric and nonparametric sequential change detection in R: the cpm package. Journal of Statistical Software, Articles 66 (3), pp. 1–20. Cited by: §1, §2.
- A transportation distance for signal analysis. Journal of Mathematical Imaging and Vision 59, pp. 187–210. Cited by: §1.
- Heavenly mathematics: the forgotten art of spherical trigonometry. Princeton University Press. Cited by: §4.
- Distance metric learning with application to clustering with side-information. Advances in Neural Information Processing Systems 15. Cited by: §1.
- A new Hausdorff distance for image matching. Pattern Recog. Lett. 26, pp. 581–586. Cited by: §1.

Comments

There are no comments yet.