Finding Statistically Significant Attribute Interactions

12/22/2016
by   Andreas Henelius, et al.
0

In many data exploration tasks it is meaningful to identify groups of attribute interactions that are specific to a variable of interest. For instance, in a dataset where the attributes are medical markers and the variable of interest (class variable) is binary indicating presence/absence of disease, we would like to know which medical markers interact with respect to the binary class label. These interactions are useful in several practical applications, for example, to gain insight into the structure of the data, in feature selection, and in data anonymisation. We present a novel method, based on statistical significance testing, that can be used to verify if the data set has been created by a given factorised class-conditional joint distribution, where the distribution is parametrised by a partition of its attributes. Furthermore, we provide a method, named ASTRID, for automatically finding a partition of attributes describing the distribution that has generated the data. State-of-the-art classifiers are utilised to capture the interactions present in the data by systematically breaking attribute interactions and observing the effect of this breaking on classifier performance. We empirically demonstrate the utility of the proposed method with examples using real and synthetic data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2017

Interpreting Classifiers through Attribute Interactions in Datasets

In this work we present the novel ASTRID method for investigating which ...
research
07/12/2018

Feature Selection for Gender Classification in TUIK Life Satisfaction Survey

As known, attribute selection is a method that is used before the classi...
research
07/05/2021

Variational Bayesian Inference for the Polytomous-Attribute Saturated Diagnostic Classification Model with Parallel Computing

As a statistical tool to assist formative assessments in educational set...
research
03/25/2021

Interpretable Approximation of High-Dimensional Data

In this paper we apply the previously introduced approximation method ba...
research
02/28/2017

Finding Significant Combinations of Continuous Features

We present an efficient feature selection method that can find all multi...
research
04/27/2023

A transparent approach to data representation

We use a binary attribute representation (BAR) model to describe a data ...
research
05/25/2020

On Irrelevance of Attributes in Flexible Prediction

This paper analyses properties of conceptual hierarchy obtained via incr...

Please sign up or login with your details

Forgot password? Click here to reset