Preservation of Anomalous Subgroups On Machine Learning Transformed Data

11/09/2019
by   Samuel C. Maina, et al.
18

In this paper, we investigate the effect of machine learning based anonymization on anomalous subgroup preservation. In particular, we train a binary classifier to discover the most anomalous subgroup in a dataset by maximizing the bias between the group's predicted odds ratio from the model and observed odds ratio from the data. We then perform anonymization using a variational autoencoder (VAE) to synthesize an entirely new dataset that would ideally be drawn from the distribution of the original data. We repeat the anomalous subgroup discovery task on the new data and compare it to what was identified pre-anonymization. We evaluated our approach using publicly available datasets from the financial industry. Our evaluation confirmed that the approach was able to produce synthetic datasets that preserved a high level of subgroup differentiation as identified initially in the original dataset. Such a distinction was maintained while having distinctly different records between the synthetic and original dataset. Finally, we packed the above end to end process into what we call Utility Guaranteed Deep Privacy (UGDP) system. UGDP can be easily extended to onboard alternative generative approaches such as GANs to synthesize tabular data.

READ FULL TEXT
research
08/12/2020

Anomaly localization by modeling perceptual features

Although unsupervised generative modeling of an image dataset using a Va...
research
03/03/2019

Self-adversarial Variational Autoencoder with Gaussian Anomaly Prior Distribution for Anomaly Detection

Recently, deep generative models have become increasingly popular in uns...
research
09/06/2023

GroupEnc: encoder with group loss for global structure preservation

Recent advances in dimensionality reduction have achieved more accurate ...
research
11/23/2021

Post-discovery Analysis of Anomalous Subsets

Analyzing the behaviour of a population in response to disease and inter...
research
08/09/2019

ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection

This paper introduces a new dataset called "ToyADMOS" designed for anoma...
research
09/23/2022

An artificial neural network-based system for detecting machine failures using tiny sound data: A case study

In an effort to advocate the research for a deep learning-based machine ...
research
04/20/2019

Distributed generation of privacy preserving data with user customization

Distributed devices such as mobile phones can produce and store large am...

Please sign up or login with your details

Forgot password? Click here to reset