Better Language Model with Hypernym Class Prediction

03/21/2022
by   He Bai, et al.
10

Class-based language models (LMs) have been long devised to address context sparsity in n-gram LMs. In this study, we revisit this approach in the context of neural LMs. We hypothesize that class-based prediction leads to an implicit context aggregation for similar words and thus can improve generalization for rare words. We map words that have a common WordNet hypernym to the same class and train large neural LMs by gradually annealing from predicting the class to token prediction during training. Empirically, this curriculum learning strategy consistently improves perplexity over various large, highly-performant state-of-the-art Transformer-based models on two datasets, WikiText-103 and Arxiv. Our analysis shows that the performance improvement is achieved without sacrificing performance on rare words. Finally, we document other attempts that failed to yield empirical gains, and discuss future directions for the adoption of class-based LMs on a larger scale.

READ FULL TEXT
research
04/14/2019

Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking

Pretraining deep neural network architectures with a language modeling o...
research
10/16/2019

BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

Pretraining deep contextualized representations using an unsupervised la...
research
05/26/2023

External Language Model Integration for Factorized Neural Transducers

We propose an adaptation method for factorized neural transducers (FNT) ...
research
07/09/2019

Neural or Statistical: An Empirical Study on Language Models for Chinese Input Recommendation on Mobile

Chinese input recommendation plays an important role in alleviating huma...
research
06/22/2018

Evaluating language models of tonal harmony

This study borrows and extends probabilistic language models from natura...
research
03/18/2018

Rare Feature Selection in High Dimensions

It is common in modern prediction problems for many predictor variables ...
research
10/07/2021

Transliteration of Foreign Words in Burmese

This manuscript provides general descriptions on transliteration of fore...

Please sign up or login with your details

Forgot password? Click here to reset