Coresets for Time Series Clustering

10/28/2021
by   Lingxiao Huang, et al.
0

We study the problem of constructing coresets for clustering problems with time series data. This problem has gained importance across many fields including biology, medicine, and economics due to the proliferation of sensors facilitating real-time measurement and rapid drop in storage costs. In particular, we consider the setting where the time series data on N entities is generated from a Gaussian mixture model with autocorrelations over k clusters in ℝ^d. Our main contribution is an algorithm to construct coresets for the maximum likelihood objective for this mixture model. Our algorithm is efficient, and under a mild boundedness assumption on the covariance matrices of the underlying Gaussians, the size of the coreset is independent of the number of entities N and the number of observations for each entity, and depends only polynomially on k, d and 1/ε, where ε is the error parameter. We empirically assess the performance of our coreset with synthetic data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2020

Bayesian nonparametric shared multi-sequence time series segmentation

In this paper, we introduce a method for segmenting time series data usi...
research
01/26/2019

Clustering Discrete Valued Time Series

There is a need for the development of models that are able to account f...
research
08/02/2019

Agglomerative Fast Super-Paramagnetic Clustering

We consider the problem of fast time-series data clustering. Building on...
research
01/28/2020

Dynamic clustering of time series data

We propose a new method for clustering multivariate time-series data bas...
research
03/05/2012

Infinite Shift-invariant Grouped Multi-task Learning for Gaussian Processes

Multi-task learning leverages shared information among data sets to impr...
research
08/31/2021

Clustering of Pain Dynamics in Sickle Cell Disease from Sparse, Uneven Samples

Irregularly sampled time series data are common in a variety of fields. ...
research
07/07/2022

Semi-unsupervised Learning for Time Series Classification

Time series are ubiquitous and therefore inherently hard to analyze and ...

Please sign up or login with your details

Forgot password? Click here to reset