Discriminating sample groups with multi-way data

06/26/2016
by   Tianmeng Lyu, et al.
0

High-dimensional linear classifiers, such as the support vector machine (SVM) and distance weighted discrimination (DWD), are commonly used in biomedical research to distinguish groups of subjects based on a large number of features. However, their use is limited to applications where a single vector of features is measured for each subject. In practice data are often multi-way, or measured over multiple dimensions. For example, metabolite abundance may be measured over multiple regions or tissues, or gene expression may be measured over multiple time points, for the same subjects. We propose a framework for linear classification of high-dimensional multi-way data, in which coefficients can be factorized into weights that are specific to each dimension. More generally, the coefficients for each measurement in a multi-way dataset are assumed to have low-rank structure. This framework extends existing classification techniques, and we have implemented multi-way versions of SVM and DWD. We describe informative simulation results, and apply multi-way DWD to data for two very different clinical research studies. The first study uses metabolite magnetic resonance spectroscopy data over multiple brain regions to compare patients with and without spinocerebellar ataxia, the second uses publicly available gene expression time-course data to compare treatment responses for patients with multiple sclerosis. Our method improves performance and simplifies interpretation over naive applications of full rank linear classification to multi-way data. An R package is available at https://github.com/lockEF/MultiwayClassification .

READ FULL TEXT
research
10/11/2021

Multiway sparse distance weighted discrimination

Modern data often take the form of a multiway array. However, most class...
research
08/25/2022

PRIME: Uncovering Circadian Oscillation Patterns and Associations with AD in Untimed Genome-wide Gene Expression across Multiple Brain Regions

The disruption of circadian rhythm is a cardinal symptom for Alzheimer's...
research
08/05/2022

Bayesian predictive modeling of multi-source multi-way data

We develop a Bayesian approach to predict a continuous or binary outcome...
research
07/12/2012

Biogeography-Based Informative Gene Selection and Cancer Classification Using SVM and Random Forests

Microarray cancer gene expression data comprise of very high dimensions....
research
01/23/2020

A covariance-enhanced approach to multi-tissue joint eQTL mapping with application to transcriptome-wide association studies

Transcriptome-wide association studies based on genetically predicted ge...
research
05/16/2020

FiberStars: Visual Comparison of Diffusion Tractography Data between Multiple Subjects

Tractography from high-dimensional diffusion magnetic resonance imaging ...
research
09/05/2017

Linear Optimal Low Rank Projection for High-Dimensional Multi-Class Data

Classification of individual samples into one or more categories is crit...

Please sign up or login with your details

Forgot password? Click here to reset