Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model Evaluation

09/19/2023
by   Yucheng Li, et al.
0

Data contamination in model evaluation is getting increasingly prevalent as the massive training corpora of large language models often unintentionally include benchmark samples. Therefore, contamination analysis has became an inevitable part of reliable model evaluation. However, existing method of contamination analysis requires the access of the entire training data which is often confidential for recent models. This prevent the community to rigorously audit these models and conduct accurate assessment of their capability. In this paper, we propose a novel method to quantify contamination without the access of the full training set, that measure the extent of contamination with perplexity. Our analysis provides evidence of significant memorisation of recent foundation models in popular reading comprehension, summarisation benchmarks, while multiple choice appears less contaminated.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2023

Adapting Large Language Models via Reading Comprehension

We explore how continued pre-training on domain-specific corpora influen...
research
05/10/2021

REPT: Bridging Language Models and Machine Reading Comprehension via Retrieval-Based Pre-training

Pre-trained Language Models (PLMs) have achieved great success on Machin...
research
02/25/2021

ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language Model for Reading Comprehension of Abstract Meaning

This paper presents our systems for the three Subtasks of SemEval Task4:...
research
01/26/2020

Dual Multi-head Co-attention for Multi-choice Reading Comprehension

Multi-choice Machine Reading Comprehension (MRC) requires model to decid...
research
02/15/2022

Quantifying Memorization Across Neural Language Models

Large language models (LMs) have been shown to memorize parts of their t...
research
01/06/2023

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

With tools like GitHub Copilot, automatic code suggestion is no longer a...
research
03/02/2019

Reliable Access to Massive Restricted Texts: Experience-based Evaluation

Libraries are seeing growing numbers of digitized textual corpora that f...

Please sign up or login with your details

Forgot password? Click here to reset