Cats & Co: Categorical Time Series Coclustering

05/06/2015
by   Dominique Gay, et al.
0

We suggest a novel method of clustering and exploratory analysis of temporal event sequences data (also known as categorical time series) based on three-dimensional data grid models. A data set of temporal event sequences can be represented as a data set of three-dimensional points, each point is defined by three variables: a sequence identifier, a time value and an event value. Instantiating data grid models to the 3D-points turns the problem into 3D-coclustering. The sequences are partitioned into clusters, the time variable is discretized into intervals and the events are partitioned into clusters. The cross-product of the univariate partitions forms a multivariate partition of the representation space, i.e., a grid of cells and it also represents a nonparametric estimator of the joint distribution of the sequences, time and events dimensions. Thus, the sequences are grouped together because they have similar joint distribution of time and events, i.e., similar distribution of events along the time dimension. The best data grid is computed using a parameter-free Bayesian model selection approach. We also suggest several criteria for exploiting the resulting grid through agglomerative hierarchies, for interpreting the clusters of sequences and characterizing their components through insightful visualizations. Extensive experiments on both synthetic and real-world data sets demonstrate that data grid models are efficient, effective and discover meaningful underlying patterns of categorical time series data.

READ FULL TEXT
research
11/28/2019

Analysis of Hydrological and Suspended Sediment Events from Mad River Wastershed using Multivariate Time Series Clustering

Hydrological storm events are a primary driver for transporting water qu...
research
07/02/2014

Nonparametric Hierarchical Clustering of Functional Data

In this paper, we deal with the problem of curves clustering. We propose...
research
04/28/2022

COSTI: a New Classifier for Sequences of Temporal Intervals

Classification of sequences of temporal intervals is a part of time seri...
research
12/06/2018

Time-Discounting Convolution for Event Sequences with Ambiguous Timestamps

This paper proposes a method for modeling event sequences with ambiguous...
research
04/05/2020

Event Clustering Event Series Characterization on Expected Frequency

We present an efficient clustering algorithm applicable to one-dimension...
research
03/20/2015

Country-scale Exploratory Analysis of Call Detail Records through the Lens of Data Grid Models

Call Detail Records (CDRs) are data recorded by telecommunications compa...
research
03/08/2021

Discovering Multiple Phases of Dynamics by Dissecting Multivariate Time Series

We proposed a data-driven approach to dissect multivariate time series i...

Please sign up or login with your details

Forgot password? Click here to reset