Cross-Lingual Morphological Tagging for Low-Resource Languages

06/14/2016
by   Jan Buys, et al.
0

Morphologically rich languages often lack the annotated linguistic resources required to develop accurate natural language processing tools. We propose models suitable for training morphological taggers with rich tagsets for low-resource languages without using direct supervision. Our approach extends existing approaches of projecting part-of-speech tags across languages, using bitext to infer constraints on the possible tags for a given word type or token. We propose a tagging model using Wsabie, a discriminative embedding-based model with rank-based learning. In our evaluation on 11 languages, on average this model performs on par with a baseline weakly-supervised HMM, while being more scalable. Multilingual experiments show that the method performs best when projecting between related language pairs. Despite the inherently lossy projection, we show that the morphological tags predicted by our models improve the downstream performance of a parser by +0.6 LAS on average.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2020

Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages

Part-of-speech (POS) taggers for low-resource languages which are exclus...
research
12/18/2021

Morpheme Boundary Detection Grammatical Feature Prediction for Gujarati : Dataset Model

Developing Natural Language Processing resources for a low resource lang...
research
07/05/2016

Learning when to trust distant supervision: An application to low-resource POS tagging using cross-lingual projection

Cross lingual projection of linguistic annotation suffers from many sour...
research
05/11/2018

Neural Factor Graph Models for Cross-lingual Morphological Tagging

Morphological analysis involves predicting the syntactic traits of a wor...
research
08/29/2018

Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging

We introduce DsDs: a cross-lingual neural part-of-speech tagger that lea...
research
04/29/2020

Basic Linguistic Resources and Baselines for Bhojpuri, Magahi and Maithili for Natural Language Processing

Corpus preparation for low-resource languages and for development of hum...
research
03/12/2019

Bootstrapping Method for Developing Part-of-Speech Tagged Corpus in Low Resource Languages Tagset - A Focus on an African Igbo

Most languages, especially in Africa, have fewer or no established part-...

Please sign up or login with your details

Forgot password? Click here to reset