Another Use of SMOTE for Interpretable Data Collaboration Analysis

08/26/2022
by   Akira Imakura, et al.
0

Recently, data collaboration (DC) analysis has been developed for privacy-preserving integrated analysis across multiple institutions. DC analysis centralizes individually constructed dimensionality-reduced intermediate representations and realizes integrated analysis via collaboration representations without sharing the original data. To construct the collaboration representations, each institution generates and shares a shareable anchor dataset and centralizes its intermediate representation. Although, random anchor dataset functions well for DC analysis in general, using an anchor dataset whose distribution is close to that of the raw dataset is expected to improve the recognition performance, particularly for the interpretable DC analysis. Based on an extension of the synthetic minority over-sampling technique (SMOTE), this study proposes an anchor data construction technique to improve the recognition performance without increasing the risk of data leakage. Numerical results demonstrate the efficiency of the proposed SMOTE-based method over the existing anchor data constructions for artificial and real-world datasets. Specifically, the proposed method achieves 9 percentage point and 38 percentage point performance improvements regarding accuracy and essential feature selection, respectively, over existing methods for an income dataset. The proposed method provides another use of SMOTE not for imbalanced data classifications but for a key technology of privacy-preserving integrated analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2022

Non-readily identifiable data collaboration analysis for multiple datasets including personal information

Multi-source data fusion, in which multiple data sources are jointly ana...
research
02/20/2019

Data collaboration analysis for distributed datasets

In this paper, we propose a data collaboration analysis method for distr...
research
08/16/2022

Collaborative causal inference on distributed data

The development of technologies for causal inference with the privacy pr...
research
06/01/2022

Privacy for Free: How does Dataset Condensation Help Privacy?

To prevent unintentional data leakage, research community has resorted t...
research
08/17/2020

Privacy-preserving feature selection: A survey and proposing a new set of protocols

Feature selection is the process of sieving features, in which informati...
research
08/01/2023

Data Collaboration Analysis applied to Compound Datasets and the Introduction of Projection data to Non-IID settings

Given the time and expense associated with bringing a drug to market, nu...
research
05/05/2023

Is dataset condensation a silver bullet for healthcare data sharing?

Safeguarding personal information is paramount for healthcare data shari...

Please sign up or login with your details

Forgot password? Click here to reset