ABM: an automatic supervised feature engineering method for loss based models based on group and fused lasso

09/22/2020
by   Weijian Luo, et al.
0

A vital problem in solving classification or regression problem is to apply feature engineering and variable selection on data before fed into models.One of a most popular feature engineering method is to discretisize continous variable with some cutting points,which is refered to as bining processing.Good cutting points are important for improving model's ability, because wonderful bining may ignore some noisy variance in continous variable range and keep useful leveled information with good ordered encodings.However, to our best knowledge a majority of cutting point selection is done via researchers domain knownledge or some naive methods like equal-width cutting or equal-frequency cutting.In this paper we propose an end-to-end supervised cutting point selection method based on group and fused lasso along with the automatically variable selection effect.We name our method ABM(automatic bining machine). We firstly cut each variable range into fine grid bins and train model with our group and group fused lasso regularization on each successive bins.It is a method that integrates feature engineering,variable selection and model training simultanously.And one more inspiring thing is that the method is flexible such that it can be taken into a bunch of loss function based model including deep neural networks.We have also implemented the method in R and open the source code to other researchers.A Python version will also meet the community in days.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2022

Variable Selection for Multiply-imputed Data: A Bayesian Framework

Multiple imputation is a widely used technique to handle missing data in...
research
04/17/2019

Variable Selection in Functional Linear Concurrent Regression

We propose a novel method for variable selection in functional linear co...
research
12/05/2022

Variable Selection using Inverse Survival Probability Weighting

In this paper, we propose two variable selection methods for adjusting t...
research
05/07/2023

Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO Regularization

LASSO regularization is a popular regression tool to enhance the predict...
research
02/23/2018

Variable selection via Group LASSO Approach : Application to the Cox Regression and frailty model

In the analysis of survival outcome supplemented with both clinical info...
research
03/28/2022

A Comparison of Hamming Errors of Representative Variable Selection Methods

Lasso is a celebrated method for variable selection in linear models, bu...

Please sign up or login with your details

Forgot password? Click here to reset