Unraveling the graph structure of tabular datasets through Bayesian and spectral analysis

In the big-data age tabular datasets are being generated and analyzed everywhere. As a consequence, finding and understanding the relationships between the features of these datasets are of great relevance. Here, to encompass these relationships we propose a methodology that maps an entire tabular dataset or just an observation into a weighted directed graph using the Shapley additive explanations technique. With this graph of relationships, we show that the inference of the hierarchical modular structure obtained by the nested stochastic block model (nSBM) as well as the study of the spectral space of the magnetic Laplacian can help us identify the classes of features and unravel non-trivial relationships. As a case study, we analyzed a socioeconomic survey conducted with students in Brazil: the PeNSE survey. The spectral embedding of the columns suggested that questions related to physical activities form a separate group. The application of the nSBM approach, corroborated with that and allowed complementary findings about the modular structure: some groups of questions showed a high adherence with the divisions qualitatively defined by the designers of the survey. However, questions from the class Safety were partly grouped by our method in the class Drugs. Surprisingly, by inspecting these questions, we observed that they were related to both these topics, suggesting an alternative interpretation of these questions. Our method can provide guidance for tabular data analysis as well as the design of future surveys.


page 7

page 9


MSGNN: A Spectral Graph Neural Network Based on a Novel Magnetic Signed Laplacian

Signed and directed networks are ubiquitous in real-world applications. ...

Localized Spectral Graph Filter Frames: A Unifying Framework, Survey of Design Considerations, and Numerical Comparison

Representing data residing on a graph as a linear combination of buildin...

Hierarchical community structure in networks

Modular and hierarchical structures are pervasive in real-world complex ...

Big Data meets Causal Survey Research: Understanding Nonresponse in the Recruitment of a Mixed-mode Online Panel

Survey scientists increasingly face the problem of high-dimensionality i...

A comparison of cluster algorithms as applied to unsupervised surveys

When considering answering important questions with data, unsupervised d...

Graph connection Laplacian and random matrices with random blocks

Graph connection Laplacian (GCL) is a modern data analysis technique tha...

Using Bayesian Network Analysis to Reveal Complex Natures of Relationships

Relationships are vital for mankind in many aspects. According to Maslow...

Please sign up or login with your details

Forgot password? Click here to reset