Speak, Memory: An Archaeology of Books Known to ChatGPT/GPT-4

04/28/2023
by   Kent K. Chang, et al.
0

In this work, we carry out a data archaeology to infer books that are known to ChatGPT and GPT-4 using a name cloze membership inference query. We find that OpenAI models have memorized a wide collection of copyrighted materials, and that the degree of memorization is tied to the frequency with which passages of those books appear on the web. The ability of these models to memorize an unknown set of books complicates assessments of measurement validity for cultural analytics by contaminating test data; we show that models perform much better on memorized books than on non-memorized books for downstream tasks. We argue that this supports a case for open models whose training data is known.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2022

Data Contamination: From Memorization to Exploitation

Pretrained language models are typically trained on massive web-based da...
research
02/05/2023

Exploring Data Augmentation for Code Generation Tasks

Advances in natural language processing, such as transfer learning from ...
research
03/16/2022

Open Set Recognition using Vision Transformer with an Additional Detection Head

Deep neural networks have demonstrated prominent capacities for image cl...
research
10/31/2022

Active Learning of Non-semantic Speech Tasks with Pretrained Models

Pretraining neural networks with massive unlabeled datasets has become p...
research
09/13/2023

TAP: Targeted Prompting for Task Adaptive Generation of Textual Training Instances for Visual Classification

Vision and Language Models (VLMs), such as CLIP, have enabled visual rec...
research
10/26/2020

The Frequency Spectrum and Geometry of the Hal Saflieni Hypogeum Appear Tuned

The Hal Saflieni Hypogeum is a unique subterranean Maltese Neolithic san...
research
08/04/2018

Use of "Web Map Image" and copyright act

In this paper, we reviewed the notes on using Web map image provided by ...

Please sign up or login with your details

Forgot password? Click here to reset