The Best of Both Worlds: Lexical Resources To Improve Low-Resource Part-of-Speech Tagging

11/21/2018
by   Barbara Plank, et al.
0

In natural language processing, the deep learning revolution has shifted the focus from conventional hand-crafted symbolic representations to dense inputs, which are adequate representations learned automatically from corpora. However, particularly when working with low-resource languages, small amounts of symbolic lexical resources such as user-generated lexicons are often available even when gold-standard corpora are not. Such additional linguistic information is though often neglected, and recent neural approaches to cross-lingual tagging typically rely only on word and subword embeddings. While these representations are effective, our recent work has shown clear benefits of combining the best of both worlds: integrating conventional lexical information improves neural cross-lingual part-of-speech (PoS) tagging. However, little is known on how complementary such additional information is, and to what extent improvements depend on the coverage and quality of these external resources. This paper seeks to fill this gap by providing the first thorough analysis on the contributions of lexical resources for cross-lingual PoS tagging in neural times.

READ FULL TEXT
research
07/05/2016

Learning when to trust distant supervision: An application to low-resource POS tagging using cross-lingual projection

Cross lingual projection of linguistic annotation suffers from many sour...
research
05/01/2017

Model Transfer for Tagging Low-resource Languages using a Bilingual Dictionary

Cross-lingual model transfer is a compelling and popular method for pred...
research
07/25/2019

Cross-Lingual Transfer for Distantly Supervised and Low-resources Indonesian NER

Manually annotated corpora for low-resource languages are usually small ...
research
06/11/2018

Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource

Most work on part-of-speech (POS) tagging is focused on high resource la...
research
08/01/2022

BabelBERT: Massively Multilingual Transformers Meet a Massively Multilingual Lexical Resource

While pretrained language models (PLMs) primarily serve as general purpo...
research
08/29/2018

Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging

We introduce DsDs: a cross-lingual neural part-of-speech tagger that lea...
research
08/14/2016

Proceedings of the LexSem+Logics Workshop 2016

Lexical semantics continues to play an important role in driving researc...

Please sign up or login with your details

Forgot password? Click here to reset