Exploiting Cross-Dialectal Gold Syntax for Low-Resource Historical Languages: Towards a Generic Parser for Pre-Modern Slavic

11/12/2020
by   Nilo Pedrazzini, et al.
0

This paper explores the possibility of improving the performance of specialized parsers for pre-modern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cross-dialectal treebank data may be exploited to overcome data scarcity and attempt the training of a variety-agnostic parser. Previous experiments on early Slavic dependency parsing are discussed, particularly with regard to their ability to tackle different orthographic, regional and stylistic features. A generic pre-modern Slavic parser and two specialized parsers – one for East Slavic and one for South Slavic – are trained using jPTDP (Nguyen Verspoor 2018), a neural network model for joint part-of-speech (POS) tagging and dependency parsing which had shown promising results on a number of Universal Dependency (UD) treebanks, including Old Church Slavonic (OCS). With these experiments, a new state of the art is obtained for both OCS (83.79% unlabelled attachment score (UAS) and 78.43% labelled attachement score (LAS)) and Old East Slavic (OES) (85.7% UAS and 80.16% LAS).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2020

ThamizhiUDp: A Dependency Parser for Tamil

This paper describes how we developed a neural-based dependency parser, ...
research
11/11/2019

Deep Contextualized Self-training for Low Resource Dependency Parsing

Neural dependency parsing has proven very effective, achieving state-of-...
research
04/29/2020

UDapter: Language Adaptation for Truly Universal Dependency Parsing

Recent advances in the field of multilingual dependency parsing have bro...
research
01/06/2017

Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages

In cross-lingual dependency annotation projection, information is often ...
research
04/10/2019

A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages

Unsupervised part of speech (POS) tagging is often framed as a clusterin...
research
08/17/2021

Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing

Different linearizations have been proposed to cast dependency parsing a...
research
01/27/2022

Systematic Investigation of Strategies Tailored for Low-Resource Settings for Sanskrit Dependency Parsing

Existing state of the art approaches for Sanskrit Dependency Parsing (SD...

Please sign up or login with your details

Forgot password? Click here to reset