Unigram-Normalized Perplexity as a Language Model Performance Measure with Different Vocabulary Sizes

11/26/2020
by   Jihyeon Roh, et al.
0

Although Perplexity is a widely used performance metric for language models, the values are highly dependent upon the number of words in the corpus and is useful to compare performance of the same corpus only. In this paper, we propose a new metric that can be used to evaluate language model performance with different vocabulary sizes. The proposed unigram-normalized Perplexity actually presents the performance improvement of the language models from that of simple unigram model, and is robust on the vocabulary size. Both theoretical analysis and computational experiments are reported.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2021

You should evaluate your language model on marginal likelihood overtokenisations

Neural language models typically tokenise input text into sub-word units...
research
12/01/2022

Extensible Prompts for Language Models

We propose eXtensible Prompt (X-Prompt) for prompting a large language m...
research
10/12/2020

Are Some Words Worth More than Others?

Current evaluation metrics for language modeling and generation rely hea...
research
01/03/2016

Contrastive Entropy: A new evaluation metric for unnormalized language models

Perplexity (per word) is the most widely used metric for evaluating lang...
research
12/25/2020

Contextual Temperature for Language Modeling

Temperature scaling has been widely used as an effective approach to con...
research
04/19/2022

Impact of Tokenization on Language Models: An Analysis for Turkish

Tokenization is an important text preprocessing step to prepare input to...
research
09/24/2020

Grounded Compositional Outputs for Adaptive Language Modeling

Language models have emerged as a central component across NLP, and a gr...

Please sign up or login with your details

Forgot password? Click here to reset