A Non-Parametric Test to Detect Data-Copying in Generative Models

04/12/2020
by   Casey Meehan, et al.
6

Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call data-copying– where the generative model memorizes and outputs training samples or small variations thereof. We provide a three sample non-parametric test for detecting data-copying that uses the training set, a separate sample from the target distribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets. For code & examples, visit https://github.com/casey-meehan/data-copying

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset