Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation

11/13/2017
by   Chunqi Wang, et al.
0

Character-based sequence labeling framework is flexible and efficient for Chinese word segmentation (CWS). Recently, many character-based neural models have been applied to CWS. While they obtain good performance, they have two obvious weaknesses. The first is that they heavily rely on manually designed bigram feature, i.e. they are not good at capturing n-gram features automatically. The second is that they make no use of full word information. For the first weakness, we propose a convolutional neural model, which is able to capture rich n-gram features without any feature engineering. For the second one, we propose an effective approach to integrate the proposed model with word embeddings. We evaluate the model on two benchmark datasets: PKU and MSR. Without any feature engineering, the model obtains competitive performance -- 95.7 state-of-the-art performance on both datasets -- 96.5 without using any external labeled resource.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2019

VCWE: Visual Character-Enhanced Word Embeddings

Chinese is a logographic writing system, and the shape of Chinese charac...
research
04/24/2017

Fast and Accurate Neural Word Segmentation for Chinese

Neural models with minimal feature engineering have achieved competitive...
research
11/14/2016

Attending to Characters in Neural Sequence Labeling Models

Sequence labeling architectures use word embeddings for capturing simila...
research
04/28/2017

Neural Word Segmentation with Rich Pretraining

Neural word segmentation research has benefited from large-scale raw tex...
research
10/03/2019

Character Feature Engineering for Japanese Word Segmentation

On word segmentation problems, machine learning architecture engineering...
research
05/23/2018

Enhancing Chinese Intent Classification by Dynamically Integrating Character Features into Word Embeddings with Ensemble Techniques

Intent classification has been widely researched on English data with de...
research
12/19/2017

Any-gram Kernels for Sentence Classification: A Sentiment Analysis Case Study

Any-gram kernels are a flexible and efficient way to employ bag-of-n-gra...

Please sign up or login with your details

Forgot password? Click here to reset