Distribution Agnostic Symbolic Representations for Time Series Dimensionality Reduction and Online Anomaly Detection

Due to the importance of the lower bounding distances and the attractiveness of symbolic representations, the family of symbolic aggregate approximations (SAX) has been used extensively for encoding time series data. However, typical SAX-based methods rely on two restrictive assumptions; the Gaussian distribution and equiprobable symbols. This paper proposes two novel data-driven SAX-based symbolic representations, distinguished by their discretization steps. The first representation, oriented for general data compaction and indexing scenarios, is based on the combination of kernel density estimation and Lloyd-Max quantization to minimize the information loss and mean squared error in the discretization step. The second method, oriented for high-level mining tasks, employs the Mean-Shift clustering method and is shown to enhance anomaly detection in the lower-dimensional space. Besides, we verify on a theoretical basis a previously observed phenomenon of the intrinsic process that results in a lower than the expected variance of the intermediate piecewise aggregate approximation. This phenomenon causes an additional information loss but can be avoided with a simple modification. The proposed representations possess all the attractive properties of the conventional SAX method. Furthermore, experimental evaluation on real-world datasets demonstrates their superiority compared to the traditional SAX and an alternative data-driven SAX variant.

READ FULL TEXT

page 3

page 4

page 5

page 7

page 10

page 11

page 12

page 13

research
05/25/2022

Towards Symbolic Time Series Representation Improved by Kernel Density Estimators

This paper deals with symbolic time series representation. It builds up ...
research
04/18/2017

Anomaly detection and motif discovery in symbolic representations of time series

The advent of the Big Data hype and the consistent recollection of event...
research
06/28/2019

Anomaly Subsequence Detection with Dynamic Local Density for Time Series

Anomaly subsequence detection is to detect inconsistent data, which alwa...
research
10/18/2012

Creating a level playing field for all symbols in a discretization

In time series analysis research there is a strong interest in discrete ...
research
03/27/2020

ABBA: Adaptive Brownian bridge-based symbolic aggregation of time series

A new symbolic representation of time series, called ABBA, is introduced...
research
10/02/2020

Modifying the Symbolic Aggregate Approximation Method to Capture Segment Trend Information

The Symbolic Aggregate approXimation (SAX) is a very popular symbolic di...
research
06/28/2019

An Improvement of PAA on Trend-Based Approximation for Time Series

Piecewise Aggregate Approximation (PAA) is a competitive basic dimension...

Please sign up or login with your details

Forgot password? Click here to reset