Log In Sign Up

Yunshan Cup 2020: Overview of the Part-of-Speech Tagging Task for Low-resourced Languages

by   Yingwen Fu, et al.

The Yunshan Cup 2020 track focused on creating a framework for evaluating different methods of part-of-speech (POS). There were two tasks for this track: (1) POS tagging for the Indonesian language, and (2) POS tagging for the Lao tagging. The Indonesian dataset is comprised of 10000 sentences from Indonesian news within 29 tags. And the Lao dataset consists of 8000 sentences within 27 tags. 25 teams registered for the task. The methods of participants ranged from feature-based to neural networks using either classical machine learning techniques or ensemble methods. The best performing results achieve an accuracy of 95.82 models significantly outperform classic feature-based methods and rule-based methods.


page 1

page 2

page 3

page 4


Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging

Ezafe is a grammatical particle in some Iranian languages that links two...

Cross-Register Projection for Headline Part of Speech Tagging

Part of speech (POS) tagging is a familiar NLP task. State of the art ta...

Toward a Standardized and More Accurate Indonesian Part-of-Speech Tagging

Previous work in Indonesian part-of-speech (POS) tagging are hard to com...

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs

We present LemmaTag, a featureless recurrent neural network architecture...

An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags

This paper presents an ensemble part-of-speech tagging approach for sour...

Identificação automática de pichação a partir de imagens urbanas

Graffiti tagging is a common issue in great cities an local authorities ...

From direct tagging to Tagging with sentences compression

In essence, the two tagging methods (direct tagging and tagging with sen...