Log In Sign Up

Word Segmentation as Graph Partition

by   Yuanhao Liu, et al.

We propose a new approach to the Chinese word segmentation problem that considers the sentence as an undirected graph, whose nodes are the characters. One can use various techniques to compute the edge weights that measure the connection strength between characters. Spectral graph partition algorithms are used to group the characters and achieve word segmentation. We follow the graph partition approach and design several unsupervised algorithms, and we show their inspiring segmentation results on two corpora: (1) electronic health records in Chinese, and (2) benchmark data from the Second International Chinese Word Segmentation Bakeoff.


page 1

page 2

page 3

page 4


Learning Chinese Word Representations From Glyphs Of Characters

In this paper, we propose new methods to learn Chinese word representati...

A realistic and robust model for Chinese word segmentation

A realistic Chinese word segmentation tool must adapt to textual variati...

A New Clustering neural network for Chinese word segmentation

In this article I proposed a new model to achieve Chinese word segmentat...

Onto Word Segmentation of the Complete Tang Poems

We aim at segmenting words in the Complete Tang Poems (CTP). Although it...

Classical Chinese Sentence Segmentation for Tomb Biographies of Tang Dynasty

Tomb biographies of the Tang dynasty provide invaluable information abou...

Optimizing the Learning Order of Chinese Characters Using a Novel Topological Sort Algorithm

We present a novel algorithm for optimizing the order in which Chinese c...