A Non-Parametric Test to Detect Data-Copying in Generative Models

04/12/2020
by   Casey Meehan, et al.
6

Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call data-copying– where the generative model memorizes and outputs training samples or small variations thereof. We provide a three sample non-parametric test for detecting data-copying that uses the training set, a separate sample from the target distribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets. For code & examples, visit https://github.com/casey-meehan/data-copying

READ FULL TEXT

page 2

page 10

page 15

page 16

research
02/09/2023

Feature Likelihood Score: Evaluating Generalization of Generative Models Using Samples

Deep generative models have demonstrated the ability to generate complex...
research
02/25/2023

Data-Copying in Generative Models: A Formal Framework

There has been some recent interest in detecting and addressing memoriza...
research
12/04/2020

A Note on Data Biases in Generative Models

It is tempting to think that machines are less prone to unfairness and p...
research
06/13/2012

Church: a language for generative models

We introduce Church, a universal language for describing stochastic gene...
research
08/03/2022

RealPatch: A Statistical Matching Framework for Model Patching with Real Samples

Machine learning classifiers are typically trained to minimise the avera...
research
11/22/2018

Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Image Stylization

Parametric generative deep models are state-of-the-art for photo and non...
research
01/21/2020

batchboost: regularization for stabilizing training with resistance to underfitting overfitting

Overfitting underfitting and stable training are an important challe...

Please sign up or login with your details

Forgot password? Click here to reset