Is feature selection secure against training data poisoning?

04/21/2018
by   Huang Xiao, et al.
0

Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5 samples), highlighting the need for specific countermeasures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2017

Sparsity Regularization and feature selection in large dimensional data

Feature selection has evolved to be an important step in several machine...
research
12/06/2017

Sparsity Regularization for classification of large dimensional data

Feature selection has evolved to be a very important step in several mac...
research
05/25/2020

Adversarial Feature Selection against Evasion Attacks

Pattern recognition and machine learning techniques have been increasing...
research
03/04/2023

Integration of Feature Selection Techniques using a Sleep Quality Dataset for Comparing Regression Algorithms

This research aims to examine the usefulness of integrating various feat...
research
02/02/2018

Secure Detection of Image Manipulation by means of Random Feature Selection

We address the problem of data-driven image manipulation detection in th...
research
04/26/2022

Identification of feasible pathway information for c-di-GMP binding proteins in cellulose production

In this paper, we utilize a machine learning approach to identify the si...
research
12/30/2020

Elastic Net based Feature Ranking and Selection

Feature selection is important in data representation and intelligent di...

Please sign up or login with your details

Forgot password? Click here to reset