On the data requirements of probing

02/25/2022
by   Zining Zhu, et al.
0

As large and powerful neural language models are developed, researchers have been increasingly interested in developing diagnostic tools to probe them. There are many papers with conclusions of the form "observation X is found in model Y", using their own datasets with varying sizes. Larger probing datasets bring more reliability, but are also expensive to collect. There is yet to be a quantitative method for estimating reasonable probing dataset sizes. We tackle this omission in the context of comparing two probing configurations: after we have collected a small dataset from a pilot study, how many additional data samples are sufficient to distinguish two different configurations? We present a novel method to estimate the required number of data samples in such experiments and, across several case studies, we verify that our estimations have sufficient statistical power. Our framework helps to systematically construct probing datasets to diagnose neural NLP models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/03/2023

Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Deploying large language models (LLMs) is challenging because they are m...
research
11/11/2021

SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets

NLP researchers need more, higher-quality text datasets. Human-labeled d...
research
07/15/2022

Probing Semantic Grounding in Language Models of Code with Representational Similarity Analysis

Representational Similarity Analysis is a method from cognitive neurosci...
research
04/05/2018

A Large-Scale Study of Language Models for Chord Prediction

We conduct a large-scale study of language models for chord prediction. ...
research
10/03/2022

The Effectiveness of Masked Language Modeling and Adapters for Factual Knowledge Injection

This paper studies the problem of injecting factual knowledge into large...
research
01/16/2020

Flow Rate Estimation From Probe Vehicle Data And Sample Size Requirements

The interest to use probe vehicles for traffic monitoring is growing. Th...

Please sign up or login with your details

Forgot password? Click here to reset