The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

05/21/2020
by   Alexander Mehler, et al.
0

In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations. We describe recent advances in the development of lemmatizers and test them against the Capitularies corpus (comprising Frankish royal edicts, mid-6th to mid-9th century), a corpus created as a reference for processing Medieval Latin. We also consider the post-correction of lemmatizations using a limited crowdsourcing process aimed at continuous review and updating of the FLL. Starting from the texts resulting from this lemmatization process, we describe the extension of the FLL by means of word embeddings, whose interactive traversing by means of SemioGraphs completes the digital enhanced hermeneutic circle. In this way, the article argues for a more comprehensive understanding of lemmatization, encompassing classical machine learning as well as intellectual post-corrections and, in particular, human computation in the form of interpretation processes based on graph representations of the underlying lexical resources.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2020

Interactive Re-Fitting as a Technique for Improving Word Embeddings

Word embeddings are a fixed, distributional representation of the contex...
research
10/06/2021

Human-in-the-Loop Refinement of Word Embeddings

Word embeddings are a fixed, distributional representation of the contex...
research
05/14/2018

Effects of Word Embeddings on Neural Network-based Pitch Accent Detection

Pitch accent detection often makes use of both acoustic and lexical feat...
research
11/15/2017

Unsupervised Morphological Expansion of Small Datasets for Improving Word Embeddings

We present a language independent, unsupervised method for building word...
research
10/29/2021

The Golden Rule as a Heuristic to Measure the Fairness of Texts Using Machine Learning

To treat others as one would wish to be treated is a common formulation ...
research
04/11/2018

Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion

In the medical domain, identifying and expanding abbreviations in clinic...
research
09/09/2021

Mining Points of Interest via Address Embeddings: An Unsupervised Approach

Digital maps are commonly used across the globe for exploring places tha...

Please sign up or login with your details

Forgot password? Click here to reset