DeepAI AI Chat
Log In Sign Up

Graph-Conditioned MLP for High-Dimensional Tabular Biomedical Data

11/11/2022
by   Andrei Margeloiu, et al.
0

Genome-wide studies leveraging recent high-throughput sequencing technologies collect high-dimensional data. However, they usually include small cohorts of patients, and the resulting tabular datasets suffer from the "curse of dimensionality". Training neural networks on such datasets is typically unstable, and the models overfit. One problem is that modern weight initialisation strategies make simplistic assumptions unsuitable for small-size datasets. We propose Graph-Conditioned MLP, a novel method to introduce priors on the parameters of an MLP. Instead of randomly initialising the first layer, we condition it directly on the training data. More specifically, we create a graph for each feature in the dataset (e.g., a gene), where each node represents a sample from the same dataset (e.g., a patient). We then use Graph Neural Networks (GNNs) to learn embeddings from these graphs and use the embeddings to initialise the MLP's parameters. Our approach opens the prospect of introducing additional biological knowledge when constructing the graphs. We present early results on 7 classification tasks from gene expression data and show that GC-MLP outperforms an MLP.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/22/2020

Using ontology embeddings for structural inductive bias in gene expression data analysis

Stratifying cancer patients based on their gene expression levels allows...
05/08/2020

Predicting gene expression from network topology using graph neural networks

Motivation: It is known that the structure of transcription and protein ...
06/01/2022

Graph Neural Networks with Precomputed Node Features

Most Graph Neural Networks (GNNs) cannot distinguish some graphs or inde...
11/28/2022

Weight Predictor Network with Feature Selection for Small Sample Tabular Biomedical Data

Tabular biomedical data is often high-dimensional but with a very small ...
03/02/2023

Vine dependence graphs with latent variables as summaries for gene expression data

The advent of high-throughput sequencing technologies has lead to vast c...
03/01/2021

Snowflake: Scaling GNNs to High-Dimensional Continuous Control via Parameter Freezing

Recent research has shown that Graph Neural Networks (GNNs) can learn po...
09/09/2021

GNisi: A graph network for reconstructing Ising models from multivariate binarized data

Ising models are a simple generative approach to describing interacting ...