Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling

10/07/2018
by   Zhiqing Sun, et al.
0

Previous traditional approaches to unsupervised Chinese word segmentation (CWS) can be roughly classified into discriminative and generative models. The former uses the carefully designed goodness measures for candidate segmentation, while the latter focuses on finding the optimal segmentation of the highest generative probability. However, while there exists a trivial way to extend the discriminative models into neural version by using neural language models, those of generative ones are non-trivial. In this paper, we propose the segmental language models (SLMs) for CWS. Our approach explicitly focuses on the segmental nature of Chinese, as well as preserves several properties of language models. In SLMs, a context encoder encodes the previous context and a segment decoder generates each segment incrementally. As far as we know, we are the first to propose a neural model for unsupervised CWS and achieve competitive performance to the state-of-the-art statistical models on four different datasets from SIGHAN 2005 bakeoff.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/02/2021

Unsupervised Word Segmentation with Bi-directional Neural Language Model

We present an unsupervised word segmentation model, in which the learnin...
research
04/24/2017

Fast and Accurate Neural Word Segmentation for Chinese

Neural models with minimal feature engineering have achieved competitive...
research
07/09/2018

Universal Word Segmentation: Implementation and Interpretation

Word segmentation is a low-level NLP task that is non-trivial for a cons...
research
11/06/2018

Fast Neural Chinese Word Segmentation for Long Sentences

Rapidly developed neural models have achieved competitive performance in...
research
04/07/2019

Unsupervised Recurrent Neural Network Grammars

Recurrent neural network grammars (RNNG) are generative models of langua...
research
08/01/2017

A Generative Parser with a Discriminative Recognition Algorithm

Generative models defining joint distributions over parse trees and sent...
research
07/09/2019

Neural or Statistical: An Empirical Study on Language Models for Chinese Input Recommendation on Mobile

Chinese input recommendation plays an important role in alleviating huma...

Please sign up or login with your details

Forgot password? Click here to reset