Topological Information Data Analysis

07/06/2019
by   Pierre Baudot, et al.
0

This paper presents methods that quantify the structure of statistical interactions within a given data set, and was first used in Tapia2018. It establishes new results on the k-multivariate mutual-informations (I_k) inspired by the topological formulation of Information introduced in. In particular we show that the vanishing of all I_k for 2≤ k ≤ n of n random variables is equivalent to their statistical independence. Pursuing the work of Hu Kuo Ting and Te Sun Han, we show that information functions provide co-ordinates for binary variables, and that they are analytically independent on the probability simplex for any set of finite variables. The maximal positive I_k identifies the variables that co-vary the most in the population, whereas the minimal negative I_k identifies synergistic clusters and the variables that differentiate-segregate the most the population. Finite data size effects and estimation biases severely constrain the effective computation of the information topology on data, and we provide simple statistical tests for the undersampling bias and the k-dependences following. We give an example of application of these methods to genetic expression and unsupervised cell-type classification. The methods unravel biologically relevant subtypes, with a sample size of 41 genes and with few errors. It establishes generic basic methods to quantify the epigenetic information storage and a unified epigenetic unsupervised learning formalism. We propose that higher-order statistical interactions and non identically distributed variables are constitutive characteristics of biological systems that should be estimated in order to unravel their significant statistical structure and diversity. The topological information data analysis presented here allows to precisely estimate this higher-order structure characteristic of biological systems.

READ FULL TEXT

page 16

page 20

page 26

research
12/04/2018

Topological Data Analysis of Single-cell Hi-C Contact Maps

In this article, we show how the recent statistical techniques developed...
research
06/14/2023

System Information Decomposition

In order to characterize complex higher-order interactions among variabl...
research
08/12/2022

Communication network model of the immune system identifies the impact of interactions with SARS-CoV-2 proteins

Interactions between SARS-CoV-2 and human proteins (SARS-CoV-2 PPIs) cau...
research
09/29/2022

Dimensions of Higher Order Factor Analysis Models

The factor analysis model is a statistical model where a certain number ...
research
01/24/2023

Neuronal architecture extracts statistical temporal patterns

Neuronal systems need to process temporal signals. We here show how high...
research
06/10/2020

Higher-order interactions in statistical physics and machine learning: A non-parametric solution to the inverse problem

We propose a model-independent definition of n-point interaction within ...
research
09/17/2021

Cross-Leverage Scores for Selecting Subsets of Explanatory Variables

In a standard regression problem, we have a set of explanatory variables...

Please sign up or login with your details

Forgot password? Click here to reset