Partially Synthetic Data for Recommender Systems: Prediction Performance and Preference Hiding

08/09/2020
by   Manel Slokom, et al.
0

This paper demonstrates the potential of statistical disclosure control for protecting the data used to train recommender systems. Specifically, we use a synthetic data generation approach to hide specific information in the user-item matrix. We apply a transformation to the original data that changes some values, but leaves others the same. The result is a partially synthetic data set that can be used for recommendation but contains less specific information about individual user preferences. Synthetic data has the potential to be useful for companies, who are interested in releasing data to allow outside parties to develop new recommender algorithms, i.e., in the case of a recommender system challenge, and also reducing the risks associated with data misappropriation. Our experiments run a set of recommender system algorithms on our partially synthetic data sets as well as on the original data. The results show that the relative performance of the algorithms on the partially synthetic data reflects the relative performance on the original data. Further analysis demonstrates that properties of the original data are preserved under synthesis, but that for certain examples of attributes accessible in the original data are hidden in the synthesized data.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/27/2021

Review of Clustering-Based Recommender Systems

Recommender systems are one of the most applied methods in machine learn...
10/07/2021

Doing Data Right: How Lessons Learned Working with Conventional Data should Inform the Future of Synthetic Data for Recommender Systems

We present a case that the newly emerging field of synthetic data in the...
09/01/2021

Black-Box Attacks on Sequential Recommenders via Data-Free Model Extraction

We investigate whether model extraction can be used to "steal" the weigh...
12/18/2013

Perturbed Gibbs Samplers for Synthetic Data Release

We propose a categorical data synthesizer with a quantifiable disclosure...
04/08/2019

Scaling Up Collaborative Filtering Data Sets through Randomized Fractal Expansions

Recommender system research suffers from a disconnect between the size o...
11/02/2020

Synthetic Data Generation for Economists

As more tech companies engage in rigorous economic analyses, we are conf...
01/23/2019

Scalable Realistic Recommendation Datasets through Fractal Expansions

Recommender System research suffers currently from a disconnect between ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.