When Classical Chinese Meets Machine Learning: Explaining the Relative Performances of Word and Sentence Segmentation Tasks

07/22/2020
by   Chao-Lin Liu, et al.
0

We consider three major text sources about the Tang Dynasty of China in our experiments that aim to segment text written in classical Chinese. These corpora include a collection of Tang Tomb Biographies, the New Tang Book, and the Old Tang Book. We show that it is possible to achieve satisfactory segmentation results with the deep learning approach. More interestingly, we found that some of the relative superiority that we observed among different designs of experiments may be explainable. The relative relevance among the training corpora provides hints/explanation for the observed differences in segmentation results that were achieved when we employed different combinations of corpora to train the classifiers.

READ FULL TEXT

page 1

page 4

research
09/23/2020

Evolution of Part-of-Speech in Classical Chinese

Classical Chinese is a language notable for its word class flexibility: ...
research
04/05/2018

Word Segmentation as Graph Partition

We propose a new approach to the Chinese word segmentation problem that ...
research
10/23/2020

Pre-trained Model for Chinese Word Segmentation with Meta Learning

Recent researches show that pre-trained models such as BERT (Devlin et a...
research
10/05/2018

Sentence Segmentation for Classical Chinese Based on LSTM with Radical Embedding

In this paper, we develop a low than character feature embedding called ...
research
09/17/2017

Character Distributions of Classical Chinese Literary Texts: Zipf's Law, Genres, and Epochs

We collect 14 representative corpora for major periods in Chinese histor...
research
09/18/2017

Flexible Computing Services for Comparisons and Analyses of Classical Chinese Poetry

We collect nine corpora of representative Chinese poetry for the time sp...
research
05/21/2019

A realistic and robust model for Chinese word segmentation

A realistic Chinese word segmentation tool must adapt to textual variati...

Please sign up or login with your details

Forgot password? Click here to reset