An Experimental Investigation of Part-Of-Speech Taggers for Vietnamese

06/14/2022
by   Tuan-Phong Nguyen, et al.
0

Part-of-speech (POS) tagging plays an important role in Natural Language Processing (NLP). Its applications can be found in many NLP tasks such as named entity recognition, syntactic parsing, dependency parsing and text chunking. In the investigation conducted in this paper, we utilize the technologies of two widely-used toolkits, ClearNLP and Stanford POS Tagger, as well as develop two new POS taggers for Vietnamese, then compare them to three well-known Vietnamese taggers, namely JVnTagger, vnTagger and RDRPOSTagger. We make a systematic comparison to find out the tagger having the best performance. We also design a new feature set to measure the performance of the statistical taggers. Our new taggers built from Stanford Tagger and ClearNLP with the new feature set can outperform all other current Vietnamese taggers in term of tagging accuracy. Moreover, we also analyze the affection of some features to the performance of statistical taggers. Lastly, the experimental results also reveal that the transformation-based tagger, RDRPOSTagger, can run significantly faster than any other statistical tagger.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2018

Computing Word Classes Using Spectral Clustering

Clustering a lexicon of words is a well-studied problem in natural langu...
research
03/27/2017

A Tidy Data Model for Natural Language Processing using cleanNLP

The package cleanNLP provides a set of fast tools for converting a textu...
research
09/05/2018

Appendix - Recommended Statistical Significance Tests for NLP Tasks

Statistical significance testing plays an important role when drawing co...
research
08/09/2018

Building a Kannada POS Tagger Using Machine Learning and Neural Network Models

POS Tagging serves as a preliminary task for many NLP applications. Kann...
research
01/11/2017

Decoding with Finite-State Transducers on GPUs

Weighted finite automata and transducers (including hidden Markov models...
research
07/12/2021

DaCy: A Unified Framework for Danish NLP

Danish natural language processing (NLP) has in recent years obtained co...
research
02/05/2023

Unleashing the True Potential of Sequence-to-Sequence Models for Sequence Tagging and Structure Parsing

Sequence-to-Sequence (S2S) models have achieved remarkable success on va...

Please sign up or login with your details

Forgot password? Click here to reset