Label scarcity in biomedicine: Data-rich latent factor discovery enhances phenotype prediction

10/12/2021
by   Marc-Andre Schulz, et al.
4

High-quality data accumulation is now becoming ubiquitous in the health domain. There is increasing opportunity to exploit rich data from normal subjects to improve supervised estimators in specific diseases with notorious data scarcity. We demonstrate that low-dimensional embedding spaces can be derived from the UK Biobank population dataset and used to enhance data-scarce prediction of health indicators, lifestyle and demographic characteristics. Phenotype predictions facilitated by Variational Autoencoder manifolds typically scaled better with increasing unlabeled data than dimensionality reduction by PCA or Isomap. Performances gains from semisupervison approaches will probably become an important ingredient for various medical data science applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2020

Supervised Dimensionality Reduction and Visualization using Centroid-encoder

Visualizing high-dimensional data is an essential task in Data Science a...
research
03/08/2021

Empirical comparison between autoencoders and traditional dimensionality reduction methods

In order to process efficiently ever-higher dimensional data such as ima...
research
12/18/2018

Deep Variational Sufficient Dimensionality Reduction

We consider the problem of sufficient dimensionality reduction (SDR), wh...
research
07/15/2022

Subgroup Discovery in Unstructured Data

Subgroup discovery is a descriptive and exploratory data mining techniqu...
research
09/27/2022

Linear Dimensionality Reduction

These notes are an overview of some classical linear methods in Multivar...
research
08/01/2017

DROP: Dimensionality Reduction Optimization for Time Series

Dimensionality reduction is critical in analyzing increasingly high-volu...

Please sign up or login with your details

Forgot password? Click here to reset