Analogical Relevance Index

by   Suryani Lim, et al.

Focusing on the most significant features of a dataset is useful both in machine learning (ML) and data mining. In ML, it can lead to a higher accuracy, a faster learning process, and ultimately a simpler and more understandable model. In data mining, identifying significant features is essential not only for gaining a better understanding of the data but also for visualization. In this paper, we demonstrate a new way of identifying significant features inspired by analogical proportions. Such a proportion is of the form of "a is to b as c is to d", comparing two pairs of items (a, b) and (c, d) in terms of similarities and dissimilarities. In a classification context, if the similarities/dissimilarities between a and b correlate with the fact that a and b have different labels, this knowledge can be transferred to c and d, inferring that c and d also have different labels. From a feature selection perspective, observing a huge number of such pairs (a, b) where a and b have different labels provides a hint about the importance of the features where a and b differ. Following this idea, we introduce the Analogical Relevance Index (ARI), a new statistical test of the significance of a given feature with respect to the label. ARI is a filter-based method. Filter-based methods are ML-agnostic but generally unable to handle feature redundancy. However, ARI can detect feature redundancy. Our experiments show that ARI is effective and outperforms well-known methods on a variety of artificial and some real datasets.


Feature Selection: A perspective on inter-attribute cooperation

High-dimensional datasets depict a challenge for learning tasks in data ...

Feature Selection with Redundancy-complementariness Dispersion

Feature selection has attracted significant attention in data mining and...

Filter Methods for Feature Selection in Supervised Machine Learning Applications – Review and Benchmark

The amount of data for machine learning (ML) applications is constantly ...

MVMR-FS : Non-parametric feature selection algorithm based on Maximum inter-class Variation and Minimum Redundancy

How to accurately measure the relevance and redundancy of features is an...

BELIEF: A distance-based redundancy-proof feature selection method for Big Data

With the advent of Big Data era, data reduction methods are highly deman...

Feature Selection Approach with Missing Values Conducted for Statistical Learning: A Case Study of Entrepreneurship Survival Dataset

In this article, we investigate the features which enhanced discriminate...

AEFE: Automatic Embedded Feature Engineering for Categorical Features

The challenge of solving data mining problems in e-commerce applications...

Please sign up or login with your details

Forgot password? Click here to reset