HistBERT: A Pre-trained Language Model for Diachronic Lexical Semantic Analysis

02/08/2022
by   Wenjun Qiu, et al.
0

Contextualized word embeddings have demonstrated state-of-the-art performance in various natural language processing tasks including those that concern historical semantic change. However, language models such as BERT was trained primarily on contemporary corpus data. To investigate whether training on historical corpus data improves diachronic semantic analysis, we present a pre-trained BERT-based language model, HistBERT, trained on the balanced Corpus of Historical American English. We examine the effectiveness of our approach by comparing the performance of the original BERT and that of HistBERT, and we report promising results in word similarity and semantic shift analysis. Our work suggests that the effectiveness of contextual embeddings in diachronic semantic analysis is dependent on the temporal profile of the input text and care should be taken in applying this methodology to study historical semantic change.

READ FULL TEXT
research
11/28/2019

Inducing Relational Knowledge from BERT

One of the most remarkable properties of word embeddings is the fact tha...
research
09/21/2020

Latin BERT: A Contextual Language Model for Classical Philology

We present Latin BERT, a contextual language model for the Latin languag...
research
01/08/2021

Misspelling Correction with Pre-trained Contextual Language Model

Spelling irregularities, known now as spelling mistakes, have been found...
research
12/02/2019

Leveraging Contextual Embeddings for Detecting Diachronic Semantic Shift

We propose a new method that leverages contextual embeddings for the tas...
research
05/17/2022

Letters From the Past: Modeling Historical Sound Change Through Diachronic Character Embeddings

While a great deal of work has been done on NLP approaches to lexical se...
research
08/19/2020

UoB at SemEval-2020 Task 12: Boosting BERT with Corpus Level Information

Pre-trained language model word representation, such as BERT, have been ...
research
10/06/2020

Exploring BERT's Sensitivity to Lexical Cues using Tests from Semantic Priming

Models trained to estimate word probabilities in context have become ubi...

Please sign up or login with your details

Forgot password? Click here to reset