Redundancy-aware unsupervised ranking based on game theory – application to gene enrichment analysis

by   Chiara Balestra, et al.

Gene set collections are a common ground to study the enrichment of genes for specific phenotypic traits. Gene set enrichment analysis aims to identify genes that are over-represented in gene sets collections and might be associated with a specific phenotypic trait. However, as this involves a massive number of hypothesis testing, it is often questionable whether a pre-processing step to reduce gene sets collections' sizes is helpful. Moreover, the often highly overlapping gene sets and the consequent low interpretability of gene sets' collections demand for a reduction of the included gene sets. Inspired by this bioinformatics context, we propose a method to rank sets within a family of sets based on the distribution of the singletons and their size. We obtain sets' importance scores by computing Shapley values without incurring into the usual exponential number of evaluations of the value function. Moreover, we address the challenge of including a redundancy awareness in the rankings obtained where, in our case, sets are redundant if they show prominent intersections. We finally evaluate our approach for gene sets collections; the rankings obtained show low redundancy and high coverage of the genes. The unsupervised nature of the proposed ranking does not allow for an evident increase in the number of significant gene sets for specific phenotypic traits when reducing the size of the collections. However, we believe that the rankings proposed are of use in bioinformatics to increase interpretability of the gene sets collections and a step forward to include redundancy into Shapley values computations.


page 1

page 2

page 3

page 4


Redundancy-aware unsupervised rankings for collections of gene sets

The biological roles of gene sets are used to group them into collection...

Unsupervised Features Ranking via Coalitional Game Theory for Categorical Data

Not all real-world data are labeled, and when labels are not available, ...

GIFT: Guided and Interpretable Factorization for Tensors - An Application to Large-Scale Multi-platform Cancer Analysis

Given multi-platform genome data with prior knowledge of functional gene...

Evaluation of large language models for discovery of gene set function

Gene set analysis is a mainstay of functional genomics, but it relies on...

Bayesian Optimization for Synthetic Gene Design

We address the problem of synthetic gene design using Bayesian optimizat...

Toward the Graphics Turing Scale on a Blue Gene Supercomputer

We investigate raytracing performance that can be achieved on a class of...

Are University Rankings Statistically Significant? A Comparison among Chinese Universities and with the USA

Purpose: We address the question of whether differences are statisticall...

Please sign up or login with your details

Forgot password? Click here to reset