DeepAI AI Chat
Log In Sign Up

Sentence Segmentation for Classical Chinese Based on LSTM with Radical Embedding

by   Xu Han, et al.

In this paper, we develop a low than character feature embedding called radical embedding, and apply it on LSTM model for sentence segmentation of pre modern Chinese texts. The datasets includes over 150 classical Chinese books from 3 different dynasties and contains different literary styles. LSTM CRF model is a state of art method for the sequence labeling problem. Our new model adds a component of radical embedding, which leads to improved performances. Experimental results based on the aforementioned Chinese books demonstrates a better accuracy than earlier methods on sentence segmentation, especial in Tang Epitaph texts.


page 1

page 2

page 3

page 4


Classical Chinese Sentence Segmentation for Tomb Biographies of Tang Dynasty

Tomb biographies of the Tang dynasty provide invaluable information abou...

A New Clustering neural network for Chinese word segmentation

In this article I proposed a new model to achieve Chinese word segmentat...

Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF

We present a character-based model for joint segmentation and POS taggin...

A Seq-to-Seq Transformer Premised Temporal Convolutional Network for Chinese Word Segmentation

The prevalent approaches of Chinese word segmentation task almost rely o...

ColBERT: Using BERT Sentence Embedding for Humor Detection

Automatic humor detection has interesting use cases in modern technologi...

Evaluating Sentence Segmentation and Word Tokenization Systems on Estonian Web Texts

Texts obtained from web are noisy and do not necessarily follow the orth...