Mining and discovering biographical information in Difangzhi with a language-model-based approach

by   Peter K. Bol, et al.

We present results of expanding the contents of the China Biographical Database by text mining historical local gazetteers, difangzhi. The goal of the database is to see how people are connected together, through kinship, social connections, and the places and offices in which they served. The gazetteers are the single most important collection of names and offices covering the Song through Qing periods. Although we begin with local officials we shall eventually include lists of local examination candidates, people from the locality who served in government, and notable local figures with biographies. The more data we collect the more connections emerge. The value of doing systematic text mining work is that we can identify relevant connections that are either directly informative or can become useful without deep historical research. Academia Sinica is developing a name database for officials in the central governments of the Ming and Qing dynasties.



There are no comments yet.


page 9

page 10


Mining Local Gazetteers of Literary Chinese with CRF and Pattern based Methods for Biographical Information in Chinese History

Person names and location names are essential building blocks for identi...

Learning Social Networks from Text Data using Covariate Information

Describing and characterizing the impact of historical figures can be ch...

HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition

Methods for linking individuals across historical data sets, typically i...

Textual Analysis for Studying Chinese Historical Documents and Literary Novels

We analyzed historical and literary documents in Chinese to gain insight...

Relational Algebra for In-Database Process Mining

The execution logs that are used for process mining in practice are ofte...

Color Aesthetics and Social Networks in Complete Tang Poems: Explorations and Discoveries

The Complete Tang Poems (CTP) is the most important source to study Tang...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.