The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

by   Alexander Mehler, et al.

In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations. We describe recent advances in the development of lemmatizers and test them against the Capitularies corpus (comprising Frankish royal edicts, mid-6th to mid-9th century), a corpus created as a reference for processing Medieval Latin. We also consider the post-correction of lemmatizations using a limited crowdsourcing process aimed at continuous review and updating of the FLL. Starting from the texts resulting from this lemmatization process, we describe the extension of the FLL by means of word embeddings, whose interactive traversing by means of SemioGraphs completes the digital enhanced hermeneutic circle. In this way, the article argues for a more comprehensive understanding of lemmatization, encompassing classical machine learning as well as intellectual post-corrections and, in particular, human computation in the form of interpretation processes based on graph representations of the underlying lexical resources.



page 1

page 2

page 3

page 4


Interactive Re-Fitting as a Technique for Improving Word Embeddings

Word embeddings are a fixed, distributional representation of the contex...

Human-in-the-Loop Refinement of Word Embeddings

Word embeddings are a fixed, distributional representation of the contex...

Unsupervised Morphological Expansion of Small Datasets for Improving Word Embeddings

We present a language independent, unsupervised method for building word...

Integrating Lexical Knowledge in Word Embeddings using Sprinkling and Retrofitting

Neural network based word embeddings, such as Word2Vec and GloVe, are pu...

The Golden Rule as a Heuristic to Measure the Fairness of Texts Using Machine Learning

To treat others as one would wish to be treated is a common formulation ...

Exploiting Task-Oriented Resources to Learn Word Embeddings for Clinical Abbreviation Expansion

In the medical domain, identifying and expanding abbreviations in clinic...

Mining Points of Interest via Address Embeddings: An Unsupervised Approach

Digital maps are commonly used across the globe for exploring places tha...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.