Novel semi-metrics for multivariate change point analysis and anomaly detection

11/04/2019
by   Nick James, et al.
16

This paper proposes a new method for determining similarity and anomalies between time series, most practically effective in large collections of (likely related) time series, with a particular focus on measuring distances between structural breaks within such a collection. We consolidate and generalise a class of semi-metric distance measures, which we term MJ distances. Experiments on simulated data demonstrate that our proposed family of distances uncover similarity within collections of time series more effectively than measures such as the Hausdorff and Wasserstein metrics. Although our class of distances do not necessarily satisfy the triangle inequality requirement of a metric, we analyse the transitivity properties of respective distance matrices in various contextual scenarios. There, we demonstrate a trade-off between robust performance in the presence of outliers, and the triangle inequality property. We show in experiments using real data that the contrived scenarios that severely violate the transitivity property rarely exhibit themselves in real data; instead, our family of measures satisfies all the properties of a metric most of the time. We illustrate three ways of analysing the distance and similarity matrices, via eigenvalue analysis, hierarchical clustering, and spectral clustering. The results from our hierarchical and spectral clustering experiments on simulated data demonstrate that the Hausdorff and Wasserstein metrics may lead to erroneous inference as to which time series are most similar with respect to their structural breaks, while our semi-metrics provide an improvement.

READ FULL TEXT

page 17

page 19

page 22

page 25

research
12/12/2019

A new method for similarity and anomaly detection in cryptocurrency markets

We propose a new approach using the MJ_1 semi-metric, from the more gene...
research
02/07/2020

Equivalence relations and L^p distances between time series

We introduce a general framework for defining equivalence and measuring ...
research
01/26/2020

Semi-metric portfolio optimisation: a new algorithm reducing simultaneous asset shocks

This paper proposes a new method for financial portfolio optimisation ba...
research
11/04/2019

Optimal Transport Based Change Point Detection and Time Series Segment Clustering

Two common problems in time series analysis are the decomposition of the...
research
02/14/2020

An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality

Distances are pervasive in machine learning. They serve as similarity me...
research
09/29/2021

Kernel distance measures for time series, random fields and other structured data

This paper introduces kdiff, a novel kernel-based measure for estimating...
research
08/07/2023

Merge Tree Geodesics and Barycenters with Path Mappings

Comparative visualization of scalar fields is often facilitated using si...

Please sign up or login with your details

Forgot password? Click here to reset