Approaching Neural Chinese Word Segmentation as a Low-Resource Machine Translation Task

08/12/2020
by   Pinzhen Chen, et al.
0

Supervised Chinese word segmentation has been widely approached as sequence labeling or sequence modeling. Recently, some researchers attempted to treat it as character-level translation, but there is still a performance gap between the translation-based approach and other methods. In this work, we apply the best practices from low-resource neural machine translation to Chinese word segmentation. We build encoder-decoder models with attention, and examine a series of techniques including regularization, data augmentation, objective weighting, transfer learning and ensembling. When benchmarked on MSR corpus under closed test condition without additional data, our method achieves 97.6 F1, which is on a par with the state of the art.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2020

Evaluating Low-Resource Machine Translation between Chinese and Vietnamese with Back-Translation

Back translation (BT) has been widely used and become one of standard te...
research
05/22/2019

Corpus Augmentation by Sentence Segmentation for Low-Resource Neural Machine Translation

Neural Machine Translation (NMT) has been proven to achieve impressive r...
research
03/19/2016

A Character-Level Decoder without Explicit Segmentation for Neural Machine Translation

The existing machine translation systems, whether phrase-based or neural...
research
12/24/2022

Optimizing Deep Transformers for Chinese-Thai Low-Resource Translation

In this paper, we study the use of deep Transformer translation model fo...
research
11/04/2017

Deep Stacking Networks for Low-Resource Chinese Word Segmentation with Transfer Learning

In recent years, neural networks have proven to be effective in Chinese ...
research
11/29/2019

Neural Chinese Word Segmentation as Sequence to Sequence Translation

Recently, Chinese word segmentation (CWS) methods using neural networks ...
research
07/31/2023

SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation

Sub-word segmentation is an essential pre-processing step for Neural Mac...

Please sign up or login with your details

Forgot password? Click here to reset