Creating a level playing field for all symbols in a discretization

10/18/2012
by   Matthew Butler, et al.
0

In time series analysis research there is a strong interest in discrete representations of real valued data streams. One approach that emerged over a decade ago and is still considered state-of-the-art is the Symbolic Aggregate Approximation algorithm. This discretization algorithm was the first symbolic approach that mapped a real-valued time series to a symbolic representation that was guaranteed to lower-bound Euclidean distance. The interest of this paper concerns the SAX assumption of data being highly Gaussian and the use of the standard normal curve to choose partitions to discretize the data. Though not necessarily, but generally, and certainly in its canonical form, the SAX approach chooses partitions on the standard normal curve that would produce an equal probability for each symbol in a finite alphabet to occur. This procedure is generally valid as a time series is normalized before the rest of the SAX algorithm is applied. However there exists a caveat to this assumption of equi-probability due to the intermediate step of Piecewise Aggregate Approximation (PAA). What we will show in this paper is that when PAA is applied the distribution of the data is indeed altered, resulting in a shrinking standard deviation that is proportional to the number of points used to create a segment of the PAA representation and the degree of auto-correlation within the series. Data that exhibits statistically significant auto-correlation is less affected by this shrinking distribution. As the standard deviation of the data contracts, the mean remains the same, however the distribution is no longer standard normal and therefore the partitions based on the standard normal curve are no longer valid for the assumption of equal probability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/01/2019

A Novel Trend Symbolic Aggregate Approximation for Time Series

Symbolic Aggregate approximation (SAX) is a classical symbolic approach ...
research
05/25/2022

Towards Symbolic Time Series Representation Improved by Kernel Density Estimators

This paper deals with symbolic time series representation. It builds up ...
research
01/10/2016

On Clustering Time Series Using Euclidean Distance and Pearson Correlation

For time series comparisons, it has often been observed that z-score nor...
research
05/20/2021

Distribution Agnostic Symbolic Representations for Time Series Dimensionality Reduction and Online Anomaly Detection

Due to the importance of the lower bounding distances and the attractive...
research
02/08/2023

ASTRIDE: Adaptive Symbolization for Time Series Databases

We introduce ASTRIDE (Adaptive Symbolization for Time seRIes DatabasEs),...
research
10/24/2016

Representation Learning with Deconvolution for Multivariate Time Series Classification and Visualization

We propose a new model based on the deconvolutional networks and SAX dis...

Please sign up or login with your details

Forgot password? Click here to reset