Latin BERT: A Contextual Language Model for Classical Philology

09/21/2020
by   David Bamman, et al.
0

We present Latin BERT, a contextual language model for the Latin language, trained on 642.7 million words from a variety of sources spanning the Classical era to the 21st century. In a series of case studies, we illustrate the affordances of this language-specific model both for work in natural language processing for Latin and in using computational methods for traditional scholarship: we show that Latin BERT achieves a new state of the art for part-of-speech tagging on all three Universal Dependency datasets for Latin and can be used for predicting missing text (including critical emendations); we create a new dataset for assessing word sense disambiguation for Latin and demonstrate that Latin BERT outperforms static word embeddings; and we show that it can be used for semantically-informed search by querying contextual nearest neighbors. We publicly release trained models to help drive future work in this space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2022

HistBERT: A Pre-trained Language Model for Diachronic Lexical Semantic Analysis

Contextualized word embeddings have demonstrated state-of-the-art perfor...
research
10/29/2020

Contextual BERT: Conditioning the Language Model Using a Global State

BERT is a popular language model whose main pre-training task is to fill...
research
05/22/2019

Deeper Text Understanding for IR with Contextual Neural Language Modeling

Neural networks provide new possibilities to automatically learn complex...
research
01/08/2021

Misspelling Correction with Pre-trained Contextual Language Model

Spelling irregularities, known now as spelling mistakes, have been found...
research
09/05/2019

Semantics-aware BERT for Language Understanding

The latest work on language representations carefully integrates context...
research
10/03/2020

Personality Trait Detection Using Bagged SVM over BERT Word Embedding Ensembles

Recently, the automatic prediction of personality traits has received in...
research
02/11/2019

BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model

We show that BERT (Devlin et al., 2018) is a Markov random field languag...

Please sign up or login with your details

Forgot password? Click here to reset