Mining Local Gazetteers of Literary Chinese with CRF and Pattern based Methods for Biographical Information in Chinese History

11/04/2015
by   Chao-Lin Liu, et al.
0

Person names and location names are essential building blocks for identifying events and social networks in historical documents that were written in literary Chinese. We take the lead to explore the research on algorithmically recognizing named entities in literary Chinese for historical studies with language-model based and conditional-random-field based methods, and extend our work to mining the document structures in historical documents. Practical evaluations were conducted with texts that were extracted from more than 220 volumes of local gazetteers (Difangzhi). Difangzhi is a huge and the single most important collection that contains information about officers who served in local government in Chinese history. Our methods performed very well on these realistic tests. Thousands of names and addresses were identified from the texts. A good portion of the extracted names match the biographical information currently recorded in the China Biographical Database (CBDB) of Harvard University, and many others can be verified by historians and will become as new additions to CBDB.

READ FULL TEXT
research
10/11/2015

Textual Analysis for Studying Chinese Historical Documents and Literary Novels

We analyzed historical and literary documents in Chinese to gain insight...
research
04/08/2015

Mining and discovering biographical information in Difangzhi with a language-model-based approach

We present results of expanding the contents of the China Biographical D...
research
02/02/2017

Topic Modeling the Hàn diăn Ancient Classics

Ancient Chinese texts present an area of enormous challenge and opportun...
research
08/25/2020

Complicating the Social Networks for Better Storytelling: An Empirical Study of Chinese Historical Text and Novel

Digital humanities is an important subject because it enables developmen...
research
08/28/2019

Classical Chinese Sentence Segmentation for Tomb Biographies of Tang Dynasty

Tomb biographies of the Tang dynasty provide invaluable information abou...
research
11/05/2015

Color Aesthetics and Social Networks in Complete Tang Poems: Explorations and Discoveries

The Complete Tang Poems (CTP) is the most important source to study Tang...
research
03/08/2019

ICDAR 2019 Historical Document Reading Challenge on Large Structured Chinese Family Records

We propose a Historical Document Reading Challenge on Large Chinese Stru...

Please sign up or login with your details

Forgot password? Click here to reset