Integrative Generalized Convex Clustering Optimization and Feature Selection for Mixed Multi-View Data

12/11/2019
by   Minjie Wang, et al.
0

In mixed multi-view data, multiple sets of diverse features are measured on the same set of samples. By integrating all available data sources, we seek to discover common group structure among the samples that may be hidden in individualistic cluster analyses of a single data-view. While several techniques for such integrative clustering have been explored, we propose and develop a convex formalization that will inherit the strong statistical, mathematical and empirical properties of increasingly popular convex clustering methods. Specifically, our Integrative Generalized Convex Clustering Optimization (iGecco) method employs different convex distances, losses, or divergences for each of the different data views with a joint convex fusion penalty that leads to common groups. Additionally, integrating mixed multi-view data is often challenging when each data source is high-dimensional. To perform feature selection in such scenarios, we develop an adaptive shifted group-lasso penalty that selects features by shrinking them towards their loss-specific centers. Our so-called iGecco+ approach selects features from each data-view that are best for determining the groups, often leading to improved integrative clustering. To fit our model, we develop a new type of generalized multi-block ADMM algorithm using sub-problem approximations that more efficiently fits our model for big data sets. Through a series of numerical experiments and real data examples on text mining and genomics, we show that iGecco+ achieves superior empirical performance for high-dimensional mixed multi-view data.

READ FULL TEXT

page 14

page 36

research
03/27/2019

Feature Selection for Data Integration with Mixed Multi-view Data

Data integration methods that analyze multiple sources of data simultane...
research
09/10/2020

Finding Stable Groups of Cross-Correlated Features in Multi-View data

Multi-view data, in which data of different types are obtained from a co...
research
05/25/2020

Supervised Convex Clustering

Clustering has long been a popular unsupervised learning approach to ide...
research
01/18/2016

Sparse Convex Clustering

Convex clustering, a convex relaxation of k-means clustering and hierarc...
research
08/13/2023

Weighted Sparse Partial Least Squares for Joint Sample and Feature Selection

Sparse Partial Least Squares (sPLS) is a common dimensionality reduction...
research
12/23/2021

Cooperative learning for multi-view analysis

We propose a new method for supervised learning with multiple sets of fe...
research
12/30/2020

Learning Sparsity and Block Diagonal Structure in Multi-View Mixture Models

Scientific studies increasingly collect multiple modalities of data to i...

Please sign up or login with your details

Forgot password? Click here to reset