Time series cluster kernels to exploit informative missingness and incomplete label information

07/10/2019
by   Karl Øyvind Mikalsen, et al.
0

The time series cluster kernel (TCK) provides a powerful tool for analysing multivariate time series subject to missing data. TCK is designed using an ensemble learning approach in which Bayesian mixture models form the base models. Because of the Bayesian approach, TCK can naturally deal with missing values without resorting to imputation and the ensemble strategy ensures robustness to hyperparameters, making it particularly well suited for unsupervised learning. However, TCK assumes missing at random and that the underlying missingness mechanism is ignorable, i.e. uninformative, an assumption that does not hold in many real-world applications, such as e.g. medicine. To overcome this limitation, we present a kernel capable of exploiting the potentially rich information in the missing values and patterns, as well as the information from the observed data. In our approach, we create a representation of the missing pattern, which is incorporated into mixed mode mixture models in such a way that the information provided by the missing patterns is effectively exploited. Moreover, we also propose a semi-supervised kernel, capable of taking advantage of incomplete label information to learn more accurate similarities. Experiments on benchmark data, as well as a real-world case study of patients described by longitudinal electronic health record data who potentially suffer from hospital-acquired infections, demonstrate the effectiveness of the proposed methods.

READ FULL TEXT
research
02/27/2020

A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs

A large fraction of the electronic health records (EHRs) consists of cli...
research
04/03/2017

Time Series Cluster Kernel for Learning Similarities between Multivariate Time Series with Missing Data

Similarity-based approaches represent a promising direction for time ser...
research
10/20/2017

Learning compressed representations of blood samples time series with missing data

Clinical measurements collected over time are naturally represented as m...
research
03/05/2021

Deep Generative Pattern-Set Mixture Models for Nonignorable Missingness

We propose a variational autoencoder architecture to model both ignorabl...
research
03/21/2018

An Unsupervised Multivariate Time Series Kernel Approach for Identifying Patients with Surgical Site Infection from Blood Samples

A large fraction of the electronic health records consists of clinical m...
research
02/23/2017

Spectral Clustering using PCKID - A Probabilistic Cluster Kernel for Incomplete Data

In this paper, we propose PCKID, a novel, robust, kernel function for sp...
research
07/07/2022

Semi-unsupervised Learning for Time Series Classification

Time series are ubiquitous and therefore inherently hard to analyze and ...

Please sign up or login with your details

Forgot password? Click here to reset