DeepAI AI Chat
Log In Sign Up

SPARTan: Scalable PARAFAC2 for Large & Sparse Data

by   Ioakeim Perros, et al.

In exploratory tensor mining, a common problem is how to analyze a set of variables across a set of subjects whose observations do not align naturally. For example, when modeling medical features across a set of patients, the number and duration of treatments may vary widely in time, meaning there is no meaningful way to align their clinical records across time points for analysis purposes. To handle such data, the state-of-the-art tensor model is the so-called PARAFAC2, which yields interpretable and robust output and can naturally handle sparse data. However, its main limitation up to now has been the lack of efficient algorithms that can handle large-scale datasets. In this work, we fill this gap by developing a scalable method to compute the PARAFAC2 decomposition of large and sparse datasets, called SPARTan. Our method exploits special structure within PARAFAC2, leading to a novel algorithmic reformulation that is both fast (in absolute time) and more memory-efficient than prior work. We evaluate SPARTan on both synthetic and real datasets, showing 22X performance gains over the best previous implementation and also handling larger problem instances for which the baseline fails. Furthermore, we are able to apply SPARTan to the mining of temporally-evolving phenotypes on data taken from real and medically complex pediatric patients. The clinical meaningfulness of the phenotypes identified in this process, as well as their temporal evolution over time for several patients, have been endorsed by clinical experts.


page 1

page 2

page 3

page 4


COPA: Constrained PARAFAC2 for Sparse & Large Datasets

PARAFAC2 has demonstrated success in modeling irregular tensors, where t...

SNeCT: Scalable network constrained Tucker decomposition for integrative multi-platform data analysis

Motivation: How do we integratively analyze large-scale multi-platform g...

tSPM+; a high-performance algorithm for mining transitive sequential patterns from clinical data

The increasing availability of large clinical datasets collected from pa...

SamBaTen: Sampling-based Batch Incremental Tensor Decomposition

Tensor decompositions are invaluable tools in analyzing multimodal datas...

Clustering Patients with Tensor Decomposition

In this paper we present a method for the unsupervised clustering of hig...

Fast and Scalable Estimator for Sparse and Unit-Rank Higher-Order Regression Models

Because tensor data appear more and more frequently in various scientifi...

A Graph-based Imputation Method for Sparse Medical Records

Electronic Medical Records (EHR) are extremely sparse. Only a small prop...

Code Repositories


The current repository provides the code accompanying the KDD 2017 paper "SPARTan: Scalable PARAFAC2 for Large & Sparse Data", by Ioakeim Perros, Evangelos E. Papalexakis, Fei Wang, Richard Vuduc, Elizabeth Searles, Michael Thompson and Jimeng Sun.

view repo