DIWIFT: Discovering Instance-wise Influential Features for Tabular Data

07/06/2022
by   Pengxiang Cheng, et al.
0

Tabular data is one of the most common data storage formats in business applications, ranging from retail, bank and E-commerce. These applications rely heavily on machine learning models to achieve business success. One of the critical problems in learning tabular data is to distinguish influential features from all the predetermined features. Global feature selection has been well-studied for quite some time, assuming that all instances have the same influential feature subsets. However, different instances rely on different feature subsets in practice, which also gives rise to that instance-wise feature selection receiving increasing attention in recent studies. In this paper, we first propose a novel method for discovering instance-wise influential features for tabular data (DIWIFT), the core of which is to introduce the influence function to measure the importance of an instance-wise feature. DIWIFT is capable of automatically discovering influential feature subsets of different sizes in different instances, which is different from global feature selection that considers all instances with the same influential feature subset. On the other hand, different from the previous instance-wise feature selection, DIWIFT minimizes the validation loss on the validation set and is thus more robust to the distribution shift existing in the training dataset and test dataset, which is important in tabular data. Finally, we conduct extensive experiments on both synthetic and real-world datasets to validate the effectiveness of our DIWIFT, compared it with baseline methods. Moreover, we also demonstrate the robustness of our method via some ablation experiments.

READ FULL TEXT
research
08/01/2023

Copula for Instance-wise Feature Selection and Ranking

Instance-wise feature selection and ranking methods can achieve a good s...
research
12/21/2018

Feature-Wise Bias Amplification

We study the phenomenon of bias amplification in classifiers, wherein a ...
research
10/11/2022

Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning

Consider making a prediction over new test data without any opportunity ...
research
06/08/2021

Dynamic Instance-Wise Classification in Correlated Feature Spaces

In a typical supervised machine learning setting, the predictions on all...
research
07/09/2020

Probabilistic Value Selection for Space Efficient Model

An alternative to current mainstream preprocessing methods is proposed: ...
research
05/09/2021

Towards Dynamic Feature Selection with Attention to Assist Banking Customers in Establishing a New Business

Establishing a new business may involve Knowledge acquisition in various...
research
04/26/2021

Instance-wise Causal Feature Selection for Model Interpretation

We formulate a causal extension to the recently introduced paradigm of i...

Please sign up or login with your details

Forgot password? Click here to reset