Span Labeling Approach for Vietnamese and Chinese Word Segmentation

10/01/2021
by   Duc-Vu Nguyen, et al.
0

In this paper, we propose a span labeling approach to model n-gram information for Vietnamese word segmentation, namely SPAN SEG. We compare the span labeling approach with the conditional random field by using encoders with the same architecture. Since Vietnamese and Chinese have similar linguistic phenomena, we evaluated the proposed method on the Vietnamese treebank benchmark dataset and five Chinese benchmark datasets. Through our experimental results, the proposed approach SpanSeg achieves higher performance than the sequence tagging approach with the state-of-the-art F-score of 98.31 Vietnamese treebank benchmark, when they both apply the contextual pre-trained language model XLM-RoBERTa and the predicted word boundary information. Besides, we do fine-tuning experiments for the span labeling approach on BERT and ZEN pre-trained language model for Chinese with fewer parameters, faster inference time, and competitive or higher F-scores than the previous state-of-the-art approach, word segmentation with word-hood memory networks, on five Chinese benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/17/2021

Joint Chinese Word Segmentation and Part-of-speech Tagging via Two-stage Span Labeling

Chinese word segmentation and part-of-speech tagging are necessary tasks...
research
10/27/2022

Unsupervised Boundary-Aware Language Model Pretraining for Chinese Sequence Labeling

Boundary information is critical for various Chinese language processing...
research
09/20/2019

BERT Meets Chinese Word Segmentation

Chinese word segmentation (CWS) is a fundamental task for Chinese langua...
research
04/29/2020

A Supervised Word Alignment Method based on Cross-Language Span Prediction using Multilingual BERT

We present a novel supervised word alignment method based on cross-langu...
research
03/31/2022

A Character-level Span-based Model for Mandarin Prosodic Structure Prediction

The accuracy of prosodic structure prediction is crucial to the naturaln...
research
05/21/2019

A realistic and robust model for Chinese word segmentation

A realistic Chinese word segmentation tool must adapt to textual variati...
research
02/24/2021

Augmenting Part-of-speech Tagging with Syntactic Information for Vietnamese and Chinese

Word segmentation and part-of-speech tagging are two critical preliminar...

Please sign up or login with your details

Forgot password? Click here to reset