Parsing Early Modern English for Linguistic Search

02/24/2020
by   Seth Kulick, et al.
0

We investigate the question of whether advances in NLP over the last few years make it possible to vastly increase the size of data usable for research in historical syntax. This brings together many of the usual tools in NLP - word embeddings, tagging, and parsing - in the service of linguistic queries over automatically annotated corpora. We train a part-of-speech (POS) tagger and parser on a corpus of historical English, using ELMo embeddings trained over a billion words of similar text. The evaluation is based on the standard metrics, as well as on the accuracy of the query searches using the parsed data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/10/2016

Part-of-Speech Tagging for Historical English

As more historical texts are digitized, there is interest in applying na...
research
12/15/2021

Penn-Helsinki Parsed Corpus of Early Modern English: First Parsing Results and Analysis

We present the first parsing results on the Penn-Helsinki Parsed Corpus ...
research
04/03/2022

A Part-of-Speech Tagger for Yiddish: First Steps in Tagging the Yiddish Book Center Corpus

We describe the construction and evaluation of a part-of-speech tagger f...
research
08/26/2015

A fully data-driven method to identify (correlated) changes in diachronic corpora

In this paper, a method for measuring synchronic corpus (dis-)similarity...
research
02/13/2023

Why Can't Discourse Parsing Generalize? A Thorough Investigation of the Impact of Data Diversity

Recent advances in discourse parsing performance create the impression t...
research
06/04/2020

Syntactic Search by Example

We present a system that allows a user to search a large linguistically ...
research
04/30/2020

Mind Your Inflections! Improving NLP for Non-Standard English with Base-Inflection Encoding

Morphological inflection is a process of word formation where base words...

Please sign up or login with your details

Forgot password? Click here to reset