Measuring pattern retention in anonymized data -- where one measure is not enough

12/24/2015
by   Sam Fletcher, et al.
0

In this paper, we explore how modifying data to preserve privacy affects the quality of the patterns discoverable in the data. For any analysis of modified data to be worth doing, the data must be as close to the original as possible. Therein lies a problem -- how does one make sure that modified data still contains the information it had before modification? This question is not the same as asking if an accurate classifier can be built from the modified data. Often in the literature, the prediction accuracy of a classifier made from modified (anonymized) data is used as evidence that the data is similar to the original. We demonstrate that this is not the case, and we propose a new methodology for measuring the retention of the patterns that existed in the original data. We then use our methodology to design three measures that can be easily implemented, each measuring aspects of the data that no pre-existing techniques can measure. These measures do not negate the usefulness of prediction accuracy or other measures -- they are complementary to them, and support our argument that one measure is almost never enough.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/09/2017

Measuring Inconsistency in Argument Graphs

There have been a number of developments in measuring inconsistency in l...
research
05/07/2013

A new framework for optimal classifier design

The use of alternative measures to evaluate classifier performance is ga...
research
10/10/2018

A Similarity Measure for Weaving Patterns in Textiles

We propose a novel approach for measuring the similarity between weaving...
research
12/05/2016

Improving the Performance of Neural Networks in Regression Tasks Using Drawering

The method presented extends a given regression neural network to make i...
research
03/22/2018

What do Deep Networks Like to See?

We propose a novel way to measure and understand convolutional neural ne...
research
01/16/2022

DeepCreativity: Measuring Creativity with Deep Learning Techniques

Measuring machine creativity is one of the most fascinating challenges i...
research
06/01/2021

ClustRank: a Visual Quality Measure Trained on Perceptual Data for Sorting Scatterplots by Cluster Patterns

Visual quality measures (VQMs) are designed to support analysts by autom...

Please sign up or login with your details

Forgot password? Click here to reset