Language Model Evaluation Beyond Perplexity

05/31/2021
by   Clara Meister, et al.
0

We propose an alternate approach to quantifying how well language models learn natural language: we ask how well they match the statistical tendencies of natural language. To answer this question, we analyze whether text generated from language models exhibits the statistical tendencies present in the human-generated text on which they were trained. We provide a framework–paired with significance tests–for evaluating the fit of language models to these trends. We find that neural language models appear to learn only a subset of the tendencies considered, but align much more closely with empirical trends than proposed theoretical distributions (when present). Further, the fit to different distributions is highly-dependent on both model architecture and generation strategy. As concrete examples, text generated under the nucleus sampling scheme adheres more closely to the type–token relationship of natural language than text produced using standard ancestral sampling; text from LSTMs reflects the natural language distributions over length, stopwords, and symbols surprisingly well.

READ FULL TEXT
research
04/24/2018

Assessing Language Models with Scaling Properties

Language models have primarily been evaluated with perplexity. While per...
research
12/21/2022

Language models are better than humans at next-token prediction

Current language models are considered to have sub-human capabilities at...
research
07/26/2023

Three Bricks to Consolidate Watermarks for Large Language Models

The task of discerning between generated and natural texts is increasing...
research
08/18/2017

Assessing the Stylistic Properties of Neurally Generated Text in Authorship Attribution

Recent applications of neural language models have led to an increased i...
research
11/02/2021

LMdiff: A Visual Diff Tool to Compare Language Models

While different language models are ubiquitous in NLP, it is hard to con...
research
05/16/2020

A Text Reassembling Approach to Natural Language Generation

Recent years have seen a number of proposals for performing Natural Lang...
research
05/23/2023

Can Large Language Models Infer and Disagree Like Humans?

Large Language Models (LLMs) have shown stellar achievements in solving ...

Please sign up or login with your details

Forgot password? Click here to reset