Privacy-preserving feature selection: A survey and proposing a new set of protocols

08/17/2020
by   Javad Rahimipour Anaraki, et al.
0

Feature selection is the process of sieving features, in which informative features are separated from the redundant and irrelevant ones. This process plays an important role in machine learning, data mining and bioinformatics. However, traditional feature selection methods are only capable of processing centralized datasets and are not able to satisfy today's distributed data processing needs. These needs require a new category of data processing algorithms called privacy-preserving feature selection, which protects users' data by not revealing any part of the data neither in the intermediate processing nor in the final results. This is vital for the datasets which contain individuals' data, such as medical datasets. Therefore, it is rational to either modify the existing algorithms or propose new ones to not only introduce the capability of being applied to distributed datasets, but also act responsibly in handling users' data by protecting their privacy. In this paper, we will review three privacy-preserving feature selection methods and provide suggestions to improve their performance when any gap is identified. We will also propose a privacy-preserving feature selection method based on the rough set feature selection. The proposed method is capable of processing both horizontally and vertically partitioned datasets in two- and multi-parties scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2021

Privacy-Preserving Multiparty Protocol for Feature Selection Problem

In this paper, we propose a secure multiparty protocol for the feature s...
research
06/18/2018

Privacy Preserving Analytics on Distributed Medical Data

Objective: To enable privacy-preserving learning of high quality generat...
research
11/01/2018

Distributed ReliefF based Feature Selection in Spark

Feature selection (FS) is a key research area in the machine learning an...
research
10/14/2022

Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models

Various privacy-preserving frameworks that respect the individual's priv...
research
07/21/2020

ADAGES: adaptive aggregation with stability for distributed feature selection

In this era of "big" data, not only the large amount of data keeps motiv...
research
09/23/2021

Federated Feature Selection for Cyber-Physical Systems of Systems

Autonomous systems generate a huge amount of multimodal data that are co...
research
08/26/2022

Another Use of SMOTE for Interpretable Data Collaboration Analysis

Recently, data collaboration (DC) analysis has been developed for privac...

Please sign up or login with your details

Forgot password? Click here to reset