Redundancy-aware unsupervised rankings for collections of gene sets

07/30/2023
by   Chiara Balestra, et al.
0

The biological roles of gene sets are used to group them into collections. These collections are often characterized by being high-dimensional, overlapping, and redundant families of sets, thus precluding a straightforward interpretation and study of their content. Bioinformatics looked for solutions to reduce their dimension or increase their intepretability. One possibility lies in aggregating overlapping gene sets to create larger pathways, but the modified biological pathways are hardly biologically justifiable. We propose to use importance scores to rank the pathways in the collections studying the context from a set covering perspective. The proposed Shapley values-based scores consider the distribution of the singletons and the size of the sets in the families; Furthermore, a trick allows us to circumvent the usual exponential complexity of Shapley values' computation. Finally, we address the challenge of including a redundancy awareness in the obtained rankings where, in our case, sets are redundant if they show prominent intersections. The rankings can be used to reduce the dimension of collections of gene sets, such that they show lower redundancy and still a high coverage of the genes. We further investigate the impact of our selection on Gene Sets Enrichment Analysis. The proposed method shows a practical utility in bioinformatics to increase the interpretability of the collections of gene sets and a step forward to include redundancy into Shapley values computations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2022

Redundancy-aware unsupervised ranking based on game theory – application to gene enrichment analysis

Gene set collections are a common ground to study the enrichment of gene...
research
05/17/2022

Unsupervised Features Ranking via Coalitional Game Theory for Categorical Data

Not all real-world data are labeled, and when labels are not available, ...
research
05/21/2018

GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization

Bioinformatics tools have been developed to interpret gene expression da...
research
08/01/2016

Example Data Sets and Collections for BeSpaceD Explained

In this report, we present example data sets and collections for the BeS...
research
12/18/2017

Phylogenomics with Paralogs

Phylogenomics heavily relies on well-curated sequence data sets that con...
research
09/07/2023

Evaluation of large language models for discovery of gene set function

Gene set analysis is a mainstay of functional genomics, but it relies on...

Please sign up or login with your details

Forgot password? Click here to reset