HNet: Graphical Hypergeometric Networks

05/10/2020
by   Erdogan Taskesen, et al.
0

Motivation: Real-world data often contain measurements with both continuous and discrete values. Despite the availability of many libraries, data sets with mixed data types require intensive pre-processing steps, and it remains a challenge to describe the relationships between variables. The data understanding phase is an important step in the data mining process, however, without making any assumptions on the data, the search space is super-exponential in the number of variables. Methods: We propose graphical hypergeometric networks (HNet), a method to test associations across variables for significance using statistical inference. The aim is to determine a network using only the significant associations in order to shed light on the complex relationships across variables. HNet processes raw unstructured data sets and outputs a network that consists of (partially) directed or undirected edges between the nodes (i.e., variables). To evaluate the accuracy of HNet, we used well known data sets and in addition generated data sets with known ground truth. The performance of HNet is compared to Bayesian structure learning. Results: We demonstrate that HNet showed high accuracy and performance in the detection of node links. In the case of the Alarm data set we can demonstrate on average an MCC score of 0.33 + 0.0002 (P<1x10-6), whereas Bayesian structure learning resulted in an average MCC score of 0.52 + 0.006 (P<1x10-11), and randomly assigning edges resulted in a MCC score of 0.004 + 0.0003 (P=0.49). Conclusions: HNet can process raw unstructured data sets, allows analysis of mixed data types, it easily scales up in number of variables, and allows detailed examination of the detected associations. Availability: https://erdogant.github.io/hnet/

READ FULL TEXT
research
08/11/2023

Learning Bayesian Networks with Heterogeneous Agronomic Data Sets via Mixed-Effect Models and Hierarchical Clustering

Research involving diverse but related data sets, where associations bet...
research
05/23/2019

On Pruning for Score-Based Bayesian Network Structure Learning

Many algorithms for score-based Bayesian network structure learning (BNS...
research
08/16/2015

From Observational Studies to Causal Rule Mining

Randomised controlled trials (RCTs) are the most effective approach to c...
research
11/14/2018

SCORE+ for Network Community Detection

SCORE is a recent approach to network community detection proposed by Ji...
research
04/05/2011

On Identifying Significant Edges in Graphical Models of Molecular Networks

Objective: Modelling the associations from high-throughput experimental ...
research
07/01/2018

Model-based Exception Mining for Object-Relational Data

This paper is based on a previous publication [29]. Our work extends exc...
research
08/17/2021

Improving Accuracy of Permutation DAG Search using Best Order Score Search

The Sparsest Permutation (SP) algorithm is accurate but limited to about...

Please sign up or login with your details

Forgot password? Click here to reset