Augmenting Part-of-speech Tagging with Syntactic Information for Vietnamese and Chinese

02/24/2021
by   Duc-Vu Nguyen, et al.
0

Word segmentation and part-of-speech tagging are two critical preliminary steps for downstream tasks in Vietnamese natural language processing. In reality, people tend to consider also the phrase boundary when performing word segmentation and part of speech tagging rather than solely process word by word from left to right. In this paper, we implement this idea to improve word segmentation and part of speech tagging the Vietnamese language by employing a simplified constituency parser. Our neural model for joint word segmentation and part-of-speech tagging has the architecture of the syllable-based CRF constituency parser. To reduce the complexity of parsing, we replace all constituent labels with a single label indicating for phrases. This model can be augmented with predicted word boundary and part-of-speech tags by other tools. Because Vietnamese and Chinese have some similar linguistic phenomena, we evaluated the proposed model and its augmented versions on three Vietnamese benchmark datasets and six Chinese benchmark datasets. Our experimental results show that the proposed model achieves higher performances than previous works for both languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2021

Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage Span Labeling

Chinese word segmentation and part-of-speech tagging are necessary tasks...
research
03/31/2021

Joint Khmer Word Segmentation and Part-of-Speech Tagging Using Deep Learning

Khmer text is written from left to right with optional space. Space is n...
research
07/30/2023

Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation

Tone is a crucial component of the prosody of Shanghainese, a Wu Chinese...
research
10/01/2021

Span Labeling Approach for Vietnamese and Chinese Word Segmentation

In this paper, we propose a span labeling approach to model n-gram infor...
research
11/16/2016

A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Recently, neural network models for natural language processing tasks ha...
research
07/11/2023

Improved POS tagging for spontaneous, clinical speech using data augmentation

This paper addresses the problem of improving POS tagging of transcripts...
research
07/09/2018

Universal Word Segmentation: Implementation and Interpretation

Word segmentation is a low-level NLP task that is non-trivial for a cons...

Please sign up or login with your details

Forgot password? Click here to reset